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...ANNOUNCING EEC-1 ™... 

The First Affordable Parallel Computer 


for only $12,950*. 


4 node parallel computer with 1 MBytes per node 
(up to 16 nodes with 4 MBytes per node) 



50 MIPS (up to 200 MIPS) 

-10 MFLOPS (up to 40 MFLOPS) 


. PWORKS, graphical parallel 
' workstation environment 


I-High-resolution color graphics: 1024x768, 

256 colors out of 16.7 million palette 


/OSF/Motif, X Window 



EXPRESS, 
parallel programming 
environment: 
parallel debugger, 
performance monitor, 
parallel graphics library 


The EEC-1 Parallel Workstation 


The EEC-1 delivers performance at a cost that makes innovative parallel processing technology affordable today for everyone’s lab 
and desk. An integrated, user-friendly, graphical parallel processing environment provides both a turnkey system for those 
interested in cost-performance and a flexible, cost-effective parallel processing workbench for students, teachers, researchers and 
developers. 


COST-PERFORMANCE - priced at $250/MIP for extra 
low-cost processing power yet benchmarks show that a 
four node EEC-1 outperforms 8 or 16 nodes of some 
significantly more expensive parallel machines 

APPLICATIONS - neural networks, finite element 
structural analysis, image processing and others 


PORTABILITY - supports development of parallel 
programs portable to more expensive machines 
such as NCUBE, IPSC, Connection Machine 
and Cray supercomputers 

NETWORKING - may be used as a parallel 
processing server on an Ethernet with Sun 
workstations, IBM PC’s and others with XI1 


* Special introductory US price of $12,950 (academic) and $14,950 (non-academic) for orders placed prior to 1 September 1990. 
See us at IJCNN 90, ICPP 90 and Supercomputing90. (Call for international prices) 



EE International Computer Corporation 
77 Oak Knoll Avenue Suite 104 
Pasadena, California 91101 
(818) 793-8255 Fax: (818) 793-0994 


TAIWAN: EXARTECH Corp. 02-537-2201 
KOREA: Sangsoo Electronics 02-780-5360 

EUROPE: EE International Computer Corporation 
Amsterdam 31-2-50-33-19-86 


EEC-1 is a trademark of EE International Computer Corporation. All other trademarks or registered trademarks are the property of their registered holders. 

Attend one of our parallel processing courses. Contact us for current schedule near you. | 
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NEW FOR NETWORK ANALYSTS AND SYSTEMS ENGINEERS 



Realistic simulation of your network or embedded computer 
system-quick results, no programming 


NETWORK II.5 now predicts performance of 
computer-communication systems 

Free trial and, if you act now, free training 


N etwork 11.5 uses 

simulation to predict your 
network performance. You simply 
describe your network and work¬ 
load. 

Animated simulation follows im- 
mediately-no programming delays. 

Easy-to-understand results 

You get an animated picture of 
your network. System bottlenecks 
and changing levels of utilization 
are apparent. 

You can simulate some portions 
of the network at a detailed level 
and others at a coarser level. 

Your reports show response 
times, messages delivered, messages 
lost, device utilization, and queue¬ 
ing statistics. 

Computers with NETWORK II.5 

NETWORK II.5 is available for 
most PC’s, Workstations, and 
Mainframes. 


Your network simulated 

You can analyze any embedded 
computer system or other com¬ 
puter-communication network. In¬ 
dustry standard protocols such as 
FDDI and IEEE Standard 802.X 
are built-in. Others can be modeled. 

You can easily study the effect of 
changing network parameters or 
even network protocols. 

Seeing your network animated in¬ 
creases everyone’s understanding of 
its operation and builds confidence 
in your results. 

Free trial information 

The free trial contains everything 
you need to try NETWORK II.5® 
on your computer. For a limited 
time we also include free training 
-no cost, no obligation. 

Call Paul Gorman at (619) 
457-9681, Fax (619) 457-1184. In 
Europe, call Nigel McNamara on 
(081) 332-0122, Fax (081) 332-0112. 
In Canada, call (613) 747-7467, Fax 
(613) 747-2224. 


Free trial offer 

I See for yourself how NETWORK II.5 
quickly answers network performance 
questions. 

Limited offer—Act now for free training. 

| DSend details on your University Offer. 







Return to: IEEE COMP 

CACI Products Company 
3344 North Torrey Pines Court 
La Jolla, California 92037 

Call Paul Gorman at (619) 457-9681. 

Fax (619) 457-1184. 

In Europe: 

CACI Products Division 
Palm Ct., 4 Heron Square 
Richmond-Upon-Thames 
Surrey TW9 1EW, UK 

Call Nigel McNamara on (081) 332-0122. 

Fax (081) 332-0112. 

In Canada: 

CACI Products Company 
1545 Carling Ave. 

Ottawa, Ontario, K1Z 8P9 
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THE BEST USE 
IN TOWN NOW 



The OPEN LOOK™ user interface. 

It's a real hit with independent software 
vendors, in-house developers and end 
users. In fact, over 300 applications are 
in development today. By people like 
Lotus,® INFORMIX,® Island Graphics,® 
Interleaf,® and Frame.® And it's the most 
popular front end to UNIX.® For a 
number of reasons. 

First of all, it makes UNIX easy to use. 
Because there are no complicated UNIX 
commands. It also looks better than any 
other interface. From its icons to its 3D 
elements. And makes users more effi¬ 
cient. For example, our drag and drop 
feature gives them a simple, intuitive 
way to move files around the desktop. 
Our push-pin icon makes it even easier 
to use. And OPEN LOOK gives users 
the same interface across multiple plat¬ 
forms, so they learn it once. And enjoy 
access to a huge range of network 
resources. 

As a developer, you'll see it's also the 
easiest to work with. Because it's part of 
OpenWindows,™ a complete develop¬ 
ment environment. With the tools you 
need to create applications faster than 



©1990 Sun Microsystems, Inc. ®Sun Microsystems and the Sun logo are registered trademarks of Sun Microsystems, Inc. OPEN LOOK is a trademark of AT&T. All other products or s 
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ever. And ready-made features, like our 
DeskSet™ graphical productivity tools, 
that you can give users right away 

Of course, the business reasons to 
choose OPEN LOOK are just as strong. 
OPEN LOOK is the standard interface 
of AT&T's UNIX System V.4, so it's 
included at no charge. And it will run on 
over 20 platforms, including DEC,® HP,® 
and IBM.® Since it's portable across 
multiple platforms, you only write your 
application once. Which saves thou¬ 
sands of man-hours. Finally, with OPEN 
LOOK, you have the full support of 
a company that leads the workstation 
industry in worldwide shipments? 

We've put together a videotape that 
shows you exactly what OPEN LOOK is 
all about. Just call us at 1-800-624-8999 
(ext. 2068), and we'll send you a 
free copy. 

Then find a nice comfortable seat 
close to your screen. Because the closer 
you look, the better we get. 

<#sun 

Xr microsystems 

Reader Service Number 2 

iurce, International Data Corporation, 1990.36.3% market share. 
















Treasurer’s MESSAGE 


1989 financial results 



The IEEE Computer Society recorded 
a deficit in 1989, the first since 1985. The 
audited financial statements show that 
expenditures for society operations ex¬ 
ceeded income by $318,100. Though 
disturbing, this result was considerably 
better than the $670,000 deficit which 
had been approved by the Board of Gov¬ 
ernors. 

As in past years, the society increased 
member services substantially in 1989, 
including a new publication, IEEE Trans¬ 
actions on Parallel and Distributed Com¬ 
puting, and the addition of videotape 
products to offerings from the Computer 
Society Press. Furthermore, investments 
were made in the CS Press operations to 
expand its production capacity. 

The bright spots last year include a 
healthy growth in membership of 6.1 per¬ 
cent and periodicals net income of 13 per¬ 
cent. These were not sufficient, however, 
to offset a downturn in CS Press sales and 
a modest, decline in conference net in¬ 
come. For CS Press, 1989 was a year of 
transition and rebuilding as the press staff 
were consolidated in the Publications Of¬ 
fice in Los Alamitos, California. Projec¬ 


tions for 1990 anticipate a much better 
year for the CS Press. 

The 1989 deficit of $318,100 is rela¬ 
tively small, only 1.8 percent of total 
1989 expenditures, but it is of great con¬ 
cern to the society’s leadership. The soci¬ 
ety’s net worth (or fund balance) of $2.7 
million is comprised primarily of fixed 
assets; as a result, even small deficits 
strain liquidity. The president, president¬ 
elect, and treasurer, working with senior 
professional staff, have developed an ag¬ 
gressive plan to reduce expenses and en¬ 
hance income to rebuild the society’s liq¬ 
uid reserve to a level appropriate for a 
rapidly growing, $20-million organiza¬ 
tion. The initial goal of 5 percent of the 
operating budget for such a reserve is 
modest, but critical. 

Following this report are the audited 
financial statements as prepared by the 
Computer Society auditors, Coopers & 
Lybrand. A close examination of the 
statements and of Figures 1 and 2 reveals 
some interesting facts about the society. 
The society has balanced, diverse 
sources of income and is not overly de¬ 
pendent on membership fees, which ac¬ 


counted for only 7.3 percent of total in¬ 
come in 1989. The deficit was created by 
increases in expenses, which outpaced 
increases in income by only 0.9 percent in 
1989. Overall, the majority of the soci¬ 
ety’s funds go directly into member ser¬ 
vices and the organization required to 
support those services. The proportion of 
the total budget for staff salaries and 
benefits remains somewhat smaller than 
is typical for similar organizations. 

In summary, the Computer Society 
once again showed strong growth in both 
member services and income in 1989. 
However, for the long-term financial 
health of the society, the rate of growth of 
expenses must be brought below the 
growth rate of income, and adequate lev¬ 
els of liquid reserves must be established 
and maintained to enable the society to 
continue to offer excellent member ser¬ 
vices at affordable prices. The society’s 
leaders are committed to that goal and ex¬ 
pect to make progress toward its achieve¬ 
ment both in the current year and in the 
budget planning for 1991 now underway. 

Joseph Boykin, Treasurer 



Figure 1. Computer Society income structure, 1989 actuals, Figure 2. Computer Society expense structure, 1989 actuals, 
$17.3 million. $17.6 million. 
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Report of independent accountants 


IEEE Computer Society Balance Sheets 
December 31, 1989 and 1988 


Assets 

1989 

1988 

Current assets: 



Cash, including interest-bearing accounts 

$ 163,600 

$ 520,000 

Investments (Note 3) 

1,897,100 

2,374,500 

Accounts receivable, less allowance for 



doubtful accounts of $66,400 in 1989 



and $72,200 in 1988 

1,145,200 

1,075,600 

Receivable from Institute of Electrical 



and Electronics Engineers, Inc. (Note 7) 

53,500 

56,900 

Conference receivables 

485,200 

639,800 

Conference advances 

182,100 

126,800 

Inventory (Note 2) 

657,800 

487,800 

Prepaid expenses and other assets 

815,400 

839,400 

Total current assets 

5,399,900 

6,120,800 

Fixed assets, net (Notes 2 and 4) 

3,722,200 

3,644,600 

Total assets 

$9,122,100 

$9,765,400 

Liabilities and Fund Balance 



Current liabilities: 



Demand note payable to bank (Note 5) 

$ 343,100 

398,500 

Current portion of long-term debt (Note 5) 

1,182,700 

84,200 

Accounts payable and accrued expenses 

1,411,500 

1,594,000 

Deferred income: 



Membership fees and subscriptions 

3,234,700 

3,252,600 

Conferences 

27,600 

15,000 

Advertising and other 

132,300 

132,700 

Total current liabilities 

6,331,900 

5,477,000 

Long-term debt, less current portion (Note 5) 

97,000 

1,277,100 

Total liabilities 

6,428,900 

6,754,100 

Fund balance: 



Undesignated 

2,468,000 

2,819,300 

Designated, primarily for technical 



committees (Note 8) 

225,200 

192,000 

Total fund balance 

2,693,200 

3,011,300 

Total liabilities and fund balance 

$9,122,100 

$9,765,400 


Board of Governors of the 
IEEE Computer Society: 

We have audited the accompanying 
balance sheets of the IEEE Computer So¬ 
ciety (the society) as of December 31, 
1989 and 1988, and the related statements 
of revenue, expenses, and changes in 
fund balance for the years then ended. 
These financial statements are the re¬ 
sponsibility of the society’s manage¬ 
ment. Our responsibility is to express an 
opinion on these financial statements 
based on our audits. 

We conducted our audits in accordance 
with generally accepted auditing stan¬ 
dards. Those standards require that we 
plan and perform the audit to obtain rea¬ 
sonable assurance about whether the fi¬ 
nancial statements are free of material 
misstatement. An audit includes examin¬ 
ing, on a test basis, evidence supporting 
the amounts and disclosures in the finan¬ 
cial statements. An audit also includes 
assessing the accounting principles used 
and significant estimates made by man¬ 
agement, as well as evaluating the overall 
financial statement presentation. We be¬ 
lieve that our audits provide a reasonable 
basis for our opinion. 

In our opinion, the financial statements 
referred to above present fairly, in all ma¬ 
terial respects, the financial position of 
the IEEE Computer Society as of Decem¬ 
ber 31, 1989 and 1988, and the results of 
its operations for the years then ended, in 
conformity with generally accepted ac¬ 
counting principles. 

Coopers & Lybrand 

Washington, DC 

April 4, 1990 


Notes to financial 
statements 

1. Organization and purpose 

The IEEE Computer Society (the soci¬ 
ety) is organized within the Institute of 
Electrical and Electronics Engineers, 
Inc. (IEEE), an organization exempt 
from income tax, pursuant to internal 
revenue Code section 501(c)(6). Within 
the bylaws of IEEE, delegation of the re¬ 
sponsibility for the society’s operations 
has been placed with the society’s Board 
of Governors and Executive Committee. 
The society’s constitution states that 
“The society shall be scientific, literary, 
and educational in character. The society 


shall strive to advance the theory, prac¬ 
tice, and application of computer and in¬ 
formation processing science and tech¬ 
nology and shall maintain a high profes¬ 


sional standing among its members. The 
society shall promote cooperation and 
exchange of technical information 
among its members and to this end shall 
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IEEE Computer Society 

Statements of Revenue, Expenses, and Changes in Fund Balance 
for the years ended December 31, 1989 and 1988 



1989 

1988 

Revenue; 



Computer Society membership fees 

$ 1,256,000 

$ 1,183,900 

Periodical subscriptions and 



other publication activities 

10,568,700 

9,236,500 

Conventions, conferences and 



other technical activities 

4,895,800 

5,083,400 

Interest income 

127,800 

141,300 

Other income 

277,200 

381,200 

Total revenue 

17,125,500 

16,026,300 

Expenses: 



Periodical and publication activities 

10,063,900 

9,094,500 

Conventions, conferences and 



other technical activities 

3,963,300 

3,953,800 

Administration 

3,416,400 

2,851,200 

Total expenses 

17,443,600 

15,899,500 

Excess of revenue over (under) expenses (318,100) 

126,800 

Fund balance at beginning of year 

3,011,300 

2,884,500 

Fund balance at end of year 

$ 2,693,200 

$ 3,011,300 


hold meetings for the presentation and dis¬ 
cussion of technical papers, shall publish 
technical journals, and shall, through its 
organization and other appropriate means, 
provide for the needs of its members.” 

2. Summary of significant accounting 
policies 

Reporting entity 

The accompanying financial statements 
include all society accounts maintained at 
the society’s offices in Washington, DC; 
Los Alamitos, California; Brussels, Bel¬ 
gium; Tokyo, Japan; and certain accounts 
maintained at IEEE Headquarters. The ac¬ 
companying financial statements do not 
include the accounts of society chapters 
which operate directly under IEEE. 

Income recognition 

Income from annual membership fees 
and periodical subscriptions is recognized 
during the year to which it pertains. The so¬ 
ciety’s share of revenue and expenses for 
conferences partially or entirely sponsored 


by the society is generally recognized 
in the year in which the conference is 
held. 

Inventory 

Inventory consists of tutorial books 
and standards published by the society 
and is stated at the lower of cost or net 
realizable value. Cost is determined on 
an average cost basis. 

Fixed assets and depreciation 

Fixed assets are recorded at cost 
when purchased. The society provides 
for depreciation of fixed assets by 
charges to revenue at rates considered 
adequate to amortize the cost of such 
assets over their estimated useful lives 
(5 to 10 years for office furniture and 
equipment; 30 and 35 years for build¬ 
ings) on a straight-line basis. 

When fixed assets are retired or oth¬ 
erwise disposed of, the property and ac¬ 
cumulated depreciation accounts are 
reduced by the applicable amounts and 
any profit or loss is reflected in revenue. 


3. Investments 

Investments consist of unrestricted 
deposits with IEEE and bear interest 
based on the average monthly balance 
maintained by the society. 

4. Fixed assets 

Fixed assets as of December 31 are 
shown in Table 1. 

Depreciation expense was $290,500 
and $296,900 in 1989 and 1988, respec¬ 
tively. 

5. Notes payable 

Notes payable as of December 31 are 
shown in Table 2. 

The note payable, due May 1, 1990, is 
collateralized by a first lien on all gross 
revenues of the society and a mortgage on 
the Washington, DC, property. Repay¬ 
ment is made in graduated amounts 
through the term of the note with the bal¬ 
ance payable on May 1, 1990. 

The demand note payable due May 1, 
1990, is unsecured. Repayment is made 
in equal monthly principal installments 
of $5,536, plus interest, with the balance 
payable on May 1, 1990. 

The note payable, due September 25, 
1992, is collateralized by certain equip¬ 
ment which was purchased with the pro¬ 
ceeds of the note. Repayment is made in 
equal monthly principal installments of 
$4,616, plus interest, with the balance 
payable on September 25, 1992. 

Interest expenses relating to all of the 
above notes amounted to $183,800 and 
$150,000 in 1989 and 1988, respectively. 

In September 1988, the society entered 
into a working capital loan agreement, 
which provides for a maximum borrow¬ 
ing of $300,000 at the bank’s floating 
prime rate. At December 31, 1989, there 
were no borrowings against this line of 
credit. 

Annual maturities of long-term debt 
outstanding as of December 31, 1989, are 
as follows: 1990 — $1,525,800; 1991 — 
$55,400; and 1992 — $41,600. 

6. Pension plan 

The society is a member of a defined- 
benefit pension plan sponsored by IEEE. 
The IEEE plan covers substantially all 
IEEE employees, including those of the 
society. It is the policy of IEEE to fund 
pension costs accrued. 

Statement of financial accounting 
standards (SFAS) No. 87, “Employers 
Accounting for Pensions,” requires that 
certain disclosures be made of the actuar¬ 
ial present value of benefit obligations, 
the projected benefit obligation, the fair 
value of the available plan assets, and the 
accrued pension costs. Such disclosures 
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Table 1. Fixed assets as of December 31. 



1989 

1988 

Land 

$1,334,400 

$1,334,400 

Buildings and improvements 

2,008,400 

1,879,500 

Warehouse equipment 

15,500 

15,500 

Office furniture and equipment 

1,369,400 

1,152,400 


4,727,700 

4,381,800 

Accumulated depreciation 

(1,005,500) 

(737,200) 


$3,722,200 

$3,644,600 


Table 2. Notes payable as of December 31. 



Annual 




interest rate 

1989 

1988 

Note payable. 




balance due on May 1, 1990 

Prime 

$1,127,300 

$1,153,500 

Demand note payable, 




balance due on May 1, 1990 

10.5% 

343,100 

398,500 

Note payable, balance 




due on September 25, 1992 

9.5% 

152,400 

207,800 



1,622,800 

1,759,800 

Less: Amount due within one year 

1,525,800 

482,700 



$ 97,000 

$1,277,100 
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are not presented for the society because 
the structure of the IEEE plan does not 
readily permit the plan’s assets and bene¬ 
fit obligation data to be determined for 
each individual society. Based on actuar¬ 
ial valuations of the IEEE plan, assuming 
a discount rate and an expected long-term 
rate of return on assets of 8.5 percent and 
an increase in the level of compensation 
of 6.5 percent, the IEEE plan assets ex¬ 
ceeded the projected benefit obligation at 
December 31,1989 and 1988. The society 
was allocated no pension expense in 1989 
or 1988. 

7. Related-party transactions 

Certain general and administrative ex¬ 
penses incurred by IEEE Headquarters 


and charged to the society amounted to 
$1,015,100 in 1989 and $898,600 in 
1988. Other transactions undertaken in 
the normal course of business between 
the society and IEEE have been reflected 
in the society’s financial statements. 

8. Designated fund balance 

The Board of Governors of the society 
has designated a portion of surplus funds 
received from conferences for use by 
technical committees in accordance with 
the society’s policy on conference sur¬ 
plus accounts. The designated amounts 
are calculated based on a formula con¬ 
tained in the policy, but in no case can 
they exceed $30,000 per technical com¬ 
mittee conference. 
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UILDING IN ADA goes faster 
WITH THE RIGHT TOOLS. 


V If you’re a programmer using Ada to build advanced weapons or next-generation telecommunications 
systems,TeleSoft'has the tools you need to increase your productivity and lower your costs. □ Many of our 
customers have produced higher quality software in 40 to 60 percent less time compared to similar projects 


in other languages. □ Heading the list of tools is the TeleGen2™Ada Compiler. Other tools include our real¬ 


time embedded tool kits, Source Level Debugger, Global Optimizer, Ada Profiler, Ada Language Tools, and 



McDonnell Douglas and 
General Dynamics are using 
TeleGen2 Ada Development 
Systems targeting em¬ 
bedded 1750A processors 
to help upgrade existing 
fighter aircraft. They are 
also using TeleGen2 Ada to 
create real-time software 
for advanced technology 
avionics programs. 


TeleArcs, an integrated programming environment for software engineering in Ada. □ We don’t just drop 
the tools on your desk and leave, either. TeleSoft provides extensive training, total support, and a unique com¬ 
bination of software systems integration, Ada education and consulting from our subsidiary, Ada Systems 
Development Corporation. All to help Ada programmers maximize their productivity. □ For complete details 


family of Ada productivity tools and services, call TeleSoft today. And we’ll help you get up to speed fast. 








"We recently delivered a real-time 
system to the Department of 
Defense that required critical 
response times. We used TeleGen2 
Ada on a Sun-3 host, targeting an 
embedded Motorola 68030. Tele- 
Gen2 Ada provided us a complete, 
robust Ada development environment 


TeleSoft is the industry leader in real¬ 
time. MISSION-CRITICAL EMBEDDED SYSTEMS 
TOOLS. TO HELP PROGRAMMERS GET THESE 


that had all of the real-time fea¬ 
tures we required. TeleSoft's inte¬ 
grated Ada environment, TeleArcs, 
contributed significantly to our 
programmers' productivity. Most 
importantly. TeleSoft backed up their 
product with excellent customer 
support. I am really pleased with 
TeleSoft's responsiveness to our needs" 
Marty Sigona 
Software Engineer, 

Tiburon Systems, Inc. 




SYSTEMS UP AND RUNNING QUICKLY. WE PRO¬ 
VIDE ENTIRE SUITES OF TARGET SPECIFIC TOOLS, 
FROM LINKERS TO ETHERNET DOWNLOAOERS. 

□ Embedded targets now supported by 
TeleGen2 Ada tools include the Motor¬ 
ola 680X0 AND 88000 family. Intel 386, 
AND MIL-STD-1750A. PLATFORMS FOR TELE¬ 
SOFT TECHNOLOGIES INCLUDE THE SUN FAM¬ 





ILY. VAX, IBM 370 AND IBM RISC SYSTEM 
6000, Cray Supercomputers, plus PC 



DOS, 386 UNIX and MAC Ils. □ TeleSoft 
customers include General dynamics, 
McDonnell Dougl \s, Martin Marietta, 




Programmed for Productivity 


TeleSoft's debugging capabilities 
include extensive interfaces to in- 
circuit emulators and many other 
tools, including the Source Level 
Debugger. This particular tool gives 
programmers an easy, clear, natu¬ 
ral way to understand and examine 
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LETTERS TO THE EDITOR 


Software shortchanged 


To the Editor: 

The recent Computer article dealing 
with unobtrusive real-time system moni¬ 
toring (J. Tsai, K. Fang, and H. Chen, 

“A Noninvasive Architecture to Monitor 
Real-Time Distributed Systems,” Com¬ 
puter , Mar. 1990, pp. 11-23) presents a 
nice view of the hardware side of the 
problem (at least for older style micro¬ 
processor architectures). But it fails to 
demonstrate much appreciation for the 
software side and as a result provides a 
somewhat misleading view of the state of 
the art in real-time system monitoring. 

The approach described in the article 
addresses only the monitoring of applica¬ 
tion software that is essentially static in 
nature, that is, where monitor condition 
attributes (addresses) can be statically 
deduced from system symbol table, link, 
and load information and statically 
loaded into a qualification control unit 
(QCU). The hardware rationale for such 
monitoring restrictions is well under¬ 
stood — it is the same approach imple¬ 
mented in many similar monitoring sys¬ 
tems, including our own. 1 What is disap¬ 
pointing is that the article fails to men¬ 
tion that modern real-time application 
software written in a high-level program¬ 
ming language rarely conforms to the 
static structure assumed by the monitor. 
For example, implementing even a 
simple breakpoint on access to a variable 
requires dynamic action to load the 
QCU, if that variable is allocated dy¬ 
namically on a runtime stack at the time 
the procedure containing its declaration 
is entered (that is, because in general the 
address of the memory allocated to the 
variable is known only at runtime). 

As a result, the discussion by Tsai et 
al. regarding “Monitoring in different 
levels of abstraction” and the claim on 
p. 17, 

The QCU is actually a hardware implemen¬ 
tation of software breakpoints defined by 
the user... that gives the maximum flexibil¬ 
ity required to support general software 
breakpoint conditions. 

are quite misleading, since they have 
validity only in the very narrow context 
of totally static real-time application 
software. 


The basis of the software side of real¬ 
time system monitoring can be found in 
the work by Planner, 2 cited by Tsai et al. 
but dismissed as only showing “... limita¬ 
tions of the monitoring activity that can 
be achieved by the monitor.” It is essen¬ 
tial that these limitations be understood 
by both hardware and software design¬ 
ers. As we describe in a retrospective as¬ 
sessment of our own monitoring system, 1 
attempts to use this sort of statically de¬ 
fined hardware monitor on dynamic ap¬ 
plication software result in monitoring 
limitations on condition formulation and 
in potentially anomalous monitor behav¬ 
ior. 

I encourage real-time hardware and 
software designers to study the entire 
collection (Tsai et al. and References 1 
and 2) to gain a full appreciation of the 
complete set of problems that must be 
addressed in an unobtrusive monitoring 
system and of the merits of alternative 
approaches (for example, the kernel-sup¬ 
ported approach suggested in Tokuda, 
Kotera, and Mercer 3 ). 

Ray Ford 

University of Kansas 


References 

1. D. Lyttle and R. Ford, “A Symbolic De¬ 
bugger for Real-Time Embedded Ada 
Software,” Software: Practice and Experi¬ 
ence, Vol. 20, No. 5, May 1990, pp. 499- 
514. 

2. B. Plattner, “Real-Time Execution 
Monitoring,” IEEE Trans. Software Eng., 
Vol. SE-10, No. 6, Nov. 1984, pp. 756-764. 

3. H. Tokuda, M. Kotera, and C. Mercer, 
“A Real-Time Monitor for a Distributed 
Real-Time Operating System,” ACM 
SIGPlan Notices, Vol. 24, No. 1, Jan. 1989, 
pp. 68-77. 


Authors’ Reply: 

A distributed system can be monitored 
with different degrees of intrusiveness. 
At one end of the spectrum are software- 
only probes. Such systems are referred to 
as intrusive monitors (used by Marinescu 


et al. 1 ) or invasive monitors (used by 
Tokuda, Kotera, and Mercer 2 ) because 
they have the greatest effect on the exe¬ 
cution of the system being monitored. At 
the other end of the spectrum are systems 
with extensive hardware support for the 
monitoring function. Such systems that 
do not affect the behavior of the systems 
being monitored are called nonintrusive 
or noninvasive monitors. According to 
Marinescu’s 1 and Tokuda’s 2 definition. 
Ford’s software monitor is an invasive 
one. Obviously, each approach has its 
own application domains. 

We don’t think our article misstates 
the state of the art in real-time system 
monitoring. It clearly states (see p. 12) 
that our principal objective is to develop 
a hardware-based noninvasive real-time 
monitoring system to ensure minimal in¬ 
terference in the execution of a target 
real-time distributed computing system. 
In fact, the hardware approach for moni¬ 
toring is quite popular in industry. With 
the reference limitation of the magazine, 
only the significant and related articles 
are listed. For thorough discussion of key 
aspects of monitoring functionality, read¬ 
ers are encouraged to refer to Mar¬ 
inescu. 1 For a comparison of various 
monitoring approaches, please refer to 
our Transactions paper. 3 

In addition, Ford’s statements on the 
nature of application software monitored 
by our system are incorrect. The monitor 
does not assume that the application soft¬ 
ware has to be a static structure. The set¬ 
ting of the trigger conditions on the QCU 
is determined by the abstraction level of 
the events of interest. If the user is inter¬ 
ested in the process-level behavior of the 
monitored system, the trigger condition 
attributes can be identified without car¬ 
ing about the memory allocation strategy 
of the programming language for the tar¬ 
get system. The memory allocation strat¬ 
egy has nothing to do with the identifica¬ 
tion of the process-level events (see the 
section on pp. 19-20, “Setting trigger 
conditions for process-level monitor¬ 
ing”). More specifically, the main re¬ 
striction of the approach is in the hard¬ 
ware part, not in the software part. 

As mentioned in our article, the ap¬ 
proach is based on the assumption that 
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the target system is of MC68000 comput¬ 
ers. That is, the hardware architecture of 
the target system is a bus-based architec¬ 
ture in which all execution information 
passes through the internal buses be¬ 
tween the CPU and the main memory. In 
this case, even the dynamic location of a 
variable can be derived from the runtime 
information collected from the buses, 
notwithstanding that “postprocessing” 
the collected information is complicated. 
If the target system is different (for ex¬ 
ample, in modern systems that use cach¬ 
ing and prefetching), the monitoring sys¬ 
tem architecture of our approach will re¬ 
quire modifications, as stated in our ar¬ 
ticle. 

Jeffrey Tsai, K.Y. Fang, 

and H.Y. Chen 

University of Illinois at Chicago 
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Send technical correspondence to Editor-in-Chief Bruce D. Shriver, Vice Presi¬ 
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Correction 


The following corrects the figure published on p. 56 of the 
June 1990 issue of Computer. Because two shades of black 
appeared to be continuous when printed, the speech bar chart in 


the figure did not properly illustrate the difference between the 
“Read only processing by block” and the “Unoptimized trace” 
portions of the bars. -Ed. 



Figure 4. System-level optimizations. 
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Fault-Tolerant Systems 


Adit D. Singh, University of Massachusetts at Amherst 
Singaravel Murugesan, Indian Space Research Organization 


T wo mishaps early this year forcefully reminded us how 
important computer systems’ reliability has become to human 
well-being in this last decade of the 20th century. In January, 
large parts of AT&T’s long-distance telephone switching network 
failed because of a software problem. Then, in February, a “fly by 









wire” Airbus A320 aircraft crashed in 
Bangalore, India. While a definitive cause 
for the accident has yet to be established 
(preliminary investigations suggest that 
the control computers might well have 
performed to specifications), the fact that 
the A320 employs a fully electronic cock¬ 
pit with no mechanical backup for control 
has raised much speculation about the 
computers’ role in the crash. 

These incidents underscore the fact that, 
as computers play a larger role in everyday 
life and our dependence on computer-auto- 
mated services and equipment increases, 
we face a rapidly growing need for fault- 
tolerant computing, systems — systems 
that produce correct results or actions even 
in the presence of faults or other anoma¬ 
lous or unexpected conditions. Such sys- 


The basic idea behind 
building in a fault- 
tolerance capability is 
to provide the system 
with extra (redundant) 
resources. 


terns were first developed for high-risk and 
life-critical applications, such as aero¬ 
space and nuclear control systems. Today, 
we also find the consequences of failure in 
more commonplace applications unac¬ 
ceptable. For example, an unreliable auto¬ 
mated teller machine can cost a bank thou¬ 
sands of dollars in lost customers (and 
perhaps substantially more in lawsuits). 

Reliability has, in fact, concerned 
people since the infancy of computing. 
Early relay and vacuum tube machines 
were notoriously unreliable. Such designs 
generally employed rudimentary fault tol¬ 
erance techniques such as error-detection 
circuits and repeated execution to ensure 
correct results. The introduction of solid- 
state devices and integrated circuits saw a 
dramatic improvement in component re¬ 
liability through the 1950s and 60s. This 
appeared to reduce the need for fault toler¬ 


ance in all but the largest systems and, of 
course, those deployed in critical applica¬ 
tions. But the respite was short-lived. In¬ 
creased system complexity along with the 
greater susceptibility to noise and radia¬ 
tion that results from the very low switch¬ 
ing energies in modern VLSI devices has 
again brought reliability concerns and 
fault-tolerant design to center stage. 

The basic idea behind building in a fault- 
tolerance capability is to provide the sys¬ 
tem with extra (redundant) resources, 
beyond the minimum needed to achieve 
the computing requirements. These extra 
resources can help overcome the effects of 
a malfunction. The redundancy can take 
the form of extra hardware, which can vote 
out an erroneous signal or switch in a spare 
to replace a failing subsystem, or addi¬ 
tional software, which can allow success¬ 
ful reexecution of a program following 
detection of a failure caused by extraneous 
noise. The latter design also requires some 
time redundancy — a performance slack 
built into the design so that the system 
has time to reexecute the program with¬ 
out failing the required performance 
specifications. 

The idea of employing redundancy for 
fault tolerance is basically similar to pro¬ 
viding a spare wheel in the trunk of a car 
against the possibility of a flat tire. Of 
course, any redundany introduced to pro¬ 
vide fault tolerance increases system cost. 
Therefore, any system requires the right 
kind of redundant resources and in the 
right measure to maximize reliability for 
the given additional cost. This demands a 
good understanding of the threat faced by 
the system and the types of failures ex¬ 
pected. New cars normally do not come 
with a spare battery, probably because 
complete and unexpected battery failure is 
rare. The probabilities of failure make a 
spare tire much more useful. 

Obviously, carrying spares for every 
possible contingency would be prohibi¬ 
tively expensive and impractical. In fact, 
no system design can provide fault toler¬ 
ance for every conceivable failure sce¬ 
nario. The trick is to achieve the desired 
level of dependability by building in pro¬ 
tection against the most likely failures, 
within the given design constraints. Most 
commonly, fault-tolerant systems provide 


protection against failure in any single 
module or component, since the likelihood 
of multiple failures occurring simultane¬ 
ously is very small. Often, the system can 
handle additional failures if they occur 
after it has achieved proper recovery from 
the first failure. Many designs take advan¬ 
tage of the fact that the vast majority of 
computer system failures are caused by 
transient or intermittent component mal¬ 
function. A system can recover from such 
a failure by restoring the correct state 
without actually replacing the failing 
component. 


The challenge 

Though many of the basic ideas behind 
fault-tolerant design are conceptually 
simple, in practice, designing a computer 
system to meet desired dependability 
specifications proves more complex than 
might appear at first glance. To begin with, 
it is difficult to statistically characterize 
beforehand the type and frequency of hard¬ 
ware and software failures likely to afflict 
a system. While reasonable models do 
exist for predicting permanent hardware 
failures in electronic components, the 
occurrance of transient and intermittent 
failures in hardware relates to such factors 
as electrical isolation, environmental 
noise, physical layout of the hardware, and 
proper timing and synchronization, which 
are difficult to model accurately enough 
for meaningful failure-rate predictions. In 
the case of software, we can sometimes 
estimate field failure rates based on the 
failure rates observed during the debug¬ 
ging process. 

Having decided on the types of failures 
to protect against, the designer of a fault- 
tolerant system must select the redundancy 
techniques best suited to the application. A 
complete system typically employs a range 
of different redundancy techniques in its 
subsystems. For example, error-correcting 
codes can protect memory. The processor 
might employ instruction retry to recover 
from a temporary transient fault. As fur¬ 
ther protection, multiprocessor systems 
might automatically assign jobs on a failed 
processor to another processor. Coding or 
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duplication can protect disk storage. Net¬ 
worked systems can have multiple paths 
between nodes so that a single failure does 
not disconnect the network. 

The designer faces the challenge of se¬ 
lecting and integrating such diverse fault- 
tolerant schemes into the system while 
ensuring that the effects of any failures 
stay within the appropriate subsystem 
boundaries and do not disable the entire 
system. The system must, of course, also 
meet performance specifications. These 
can be quite demanding in real-time con¬ 
trol applications, where the allowable time 
to mask or detect and correct a failure must 
be much less than the time constraint of the 
process controlled. In addition, usually 
many other real-life constraints affect cost, 
weight, volume, power consumption, 
length of development cycle, flexibility 
(ease of change), maintainability, etc. 

Unfortunately, the increasing complex¬ 
ity of modern computing systems makes it 
more difficult to ensure high dependabil¬ 
ity. For example, the high level of integra¬ 
tion of new VLSI devices makes it virtu¬ 
ally impossible to test them completely. 
This leaves open the possibility that a fault 
already present in a circuit might interact 
with a new failure in the field to defeat a 
fault-tolerance mechanism designed to 
handle only one fault at a time. Similarly, 
as software becomes very large and com¬ 
plex, designed to respond to many real¬ 
time interrupts, the processes of verifica¬ 
tion, validation, and testing become ex¬ 
tremely difficult, if not impossible. Unde¬ 
tected software errors can show up during 
real-life operation and cause the system to 
fail. 

A more fundamental problem with the 
design approach outlined above, which 
first enumerates the failure scenarios and 
then designs protection against likely fail¬ 
ures, is that it will fail to protect against the 
unexpected. This particularly concerns us 
in ultra reliable systems, where failure 
often results from situations not covered 
by the fault-tolerance mechanisms. Pre¬ 
dicting the probability of such failures and 
hence the reliability of the system is ex¬ 
tremely difficult. Protecting against de¬ 
sign errors and unexpected failure situ¬ 
ations poses the principal challenge facing 
designers of the current generation of ultra 


reliable fault-tolerant systems. 

Our discussion clearly suggests that 
building fault tolerance into a system can¬ 
not be an afterthought, an add-on approach. 
Design for fault tolerance needs to begin 
with the early stages of system conceptu¬ 
alization, requirement specification, and 
system design. 


T his issue opens with an introduc¬ 
tory article by Victor Nelson 
explaining the basic concepts of 
the field. The next article, by Daniel 
Siewiorek, discusses how these fault-tol¬ 
erance techniques are incorporated in prac¬ 
tical commercial systems. The rest of the 
articles in the issue discuss recent advances 
in several other areas important to fault- 
tolerant computing: software fault toler¬ 
ance, error-detection strategies, coding, 
fault tolerance in VLSI circuits, and relia¬ 
bility modeling. 

This list of important topics is far from 
exhaustive. Because of the restrictions on 
the size of this special issue, we had to 
leave out many other important subjects. 
For these and additional state-of-the-art 
information, We refer the reader to the 
proceedings of the annual IEEE Interna¬ 
tional Symposium on Fault-Tolerant 
Computing, held each summer, and the 
biannual special issues on fault-tolerant 
computing in IEEE Transactions on 
Computers. ■ 
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F ault-Tolerant 
Computing: 
Fundamental 
Concepts 


Victor P. Nelson 
Auburn University 


D igital systems have been entrusted 
with increasingly more critical re¬ 
sponsibilities, requiring high de¬ 
pendability. Often the use of high-quality 
components and design techniques does 
not sufficiently reduce the likelihood of 
system failures, and means must be pro¬ 
vided to tolerate faults in the system. 

This article reviews the basic concepts 
of fault-tolerant computing, focusing on 
hardware. It examines failures, faults, and 
errors in digital systems and defines meas¬ 
ures of dependability, which dictate and 
evaluate fault-tolerance strategies for dif¬ 
ferent classes of applications. The various 
mechanisms for implementing a fault-tol¬ 
erance strategy are reviewed, including 
error detection, fault masking, fault con¬ 
finement, system reconfiguration and re¬ 
pair, and system recovery. 

Failures, faults, 
and errors 

When applied to digital systems, the 
terms failure, fault, and error have differ¬ 
ent meanings. 1,2 Failure denotes an 
element’s inability to perform its designed 
function because of errors in the element or 
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Fault tolerance is 
crucial in military and 
aerospace computing, 
and desirable in other 
applications. This 
review discusses basic 
concepts, 
mechanisms, and 
strategies and 
sketches future 
directions. 


its environment, which in turn are caused 
by various faults. 

A fault is an anomalous physical condi¬ 
tion. Causes include design errors, such as 
mistakes in system specification or im¬ 
plementation; manufacturing problems; 
damage, fatigue, or other deterioration; 
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and external disturbances, such as harsh 
environmental conditions, electromag¬ 
netic interference, ionizing radiation, un¬ 
anticipated inputs, or system misuse. 
Faults resulting from design errors and 
external factors are especially difficult to 
model and protect against, because their 
occurrences and effects are hard to pre¬ 
dict. 

An error is a manifestation of a fault in 
a system, in which the logical state of an 
element differs from its intended value. A 
fault in a system does not necessarily 
result in an error. An error occurs only 
when a fault is “sensitized”; in other 
words, for a particular system state and 
input excitation, an incorrect next state 
and/or output results. A fault is referred to 
as latent if it has not yet been sensitized in 
the system. The term soft is often applied 
to errors that persist after the originating 
fault disappears. Once corrected, soft er¬ 
rors usually leave no damage in the sys- 


Hierarchical models of faults and er¬ 
rors. Device testing and fault-tolerant 
design require fault and error modeling at 
one or more levels of design abstraction, 
with various trade-offs between accuracy 
and ease of modeling and analysis. At the 
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lowest level, faults are technology depend¬ 
ent. Such physical defects as shorts or 
opens in metal or polysilicon signal lines 
can alter voltages, switching times, and 
other properties. 3 External disturbances 
also work at this level, affecting signal 
lines, charge storage, and other properties. 

At the logical level, a digital system is 
modeled with gates and memory elements, 
with all signals represented as binary val¬ 
ues. Low-level fault-tolerance strategies 
are designed to detect or mask faults that 
produce erroneous logical values. Because 
of its simplicity, the “stuck-at” model is 
the most widely used logical fault model, 
assuming that a fault manifests itself as a 
fixed logical value on a signal line. A more 
complex model is the “bridging” fault, in 
which coupling between signal lines re¬ 
sults in the logical value of one line affect¬ 
ing the value of another. Other complex 
faults alter the basic logical function of a 
gate, as often happens in programmable 
logic arrays, where the presence or ab¬ 
sence of connections in an AND/OR array 
results in implicants being added to or 
removed from a function. 

At higher levels of abstraction (regis¬ 
ters, arithmetic logic units, processors, 
etc.) faults typically appear as changes in 
the module’s behavior, as represented by 
its truth table or state table. At this level 
fault modeling is usually more abstract to 
facilitate simulation at the behavioral 
level; hence, accuracy is often sacrificed. 

Fault properties. A fault can be classi¬ 
fied by its duration, nature, and extent. The 
duration of a fault can be transient, inter¬ 
mittent, or permanent. A transient fault, 
often the result of external disturbances, 
exists for a finite length of time and is 
nonrecurring. A system with an intermit¬ 
tent fault oscillates between faulty and 
fault-free operation. Usually, an intermit¬ 
tent fault results from marginal or unstable 
device operation. Permanent or “hard” 
faults are device conditions that do not 
correct with time. They result from compo¬ 
nent failures, physical damage, or design 
errors. Transient and intermittent faults 
typically occur with greater frequency than 
permanent faults and are more difficult to 
detect, since they may disappear after 
producing errors. 

The nature of a fault is determined by its 
behavior in the system. A logical fault 
produces errors that can be represented as 
logical values, while errors resulting from 
indeterminate faults do not have logical 
equivalents. For example, the shorting of a 
logic gate input to ground can be modeled 
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A fault-tolerant system 
is not necessarily 
highly dependable, 
nor does high 
dependability necessarily 
require fault tolerance. 


as a stuck-at-0 fault at that input. However, 
the behavior of a gate input whose signal 
voltage floats between the logic 1 and 0 
thresholds cannot be represented as a 
simple logical value. Other indeterminate 
faults affect propagation times and other 
electrical parameters, making them diffi¬ 
cult to model. 

The extent of a fault is determined by the 
area affected at the level of abstraction 
being considered: Local faults affect 
single components, and global faults affect 
multiple components. Because of cost con¬ 
straints, many fault-tolerance and device¬ 
testing strategies address only single, sta¬ 
tistically independent faults. Multiple 
faults require more extensive fault models 
and global approaches to fault tolerance. 
However, multiple faults become more 
likely at increased very large scale integra¬ 
tion levels. In addition, external distur¬ 
bances tend to have global effects, espe¬ 
cially in military and aerospace applica¬ 
tions subject to electromagnetic interfer¬ 
ence and ionized-particle radiation. 
Hence, multiple faults are receiving in¬ 
creasing attention. 


Evaluating 
dependability and 
fault tolerance 

The goal of fault-tolerant design is to 
improve dependability 2 by enabling a sys¬ 
tem to perform its intended function in the 
presence of a given number of faults. Note, 
however, that a fault-tolerant system is not 
necessarily highly dependable, nor does 
high dependability necessarily require 
fault tolerance. 

Dependability can be quantified by de¬ 
terministic or probabilistic measures. A 


deterministic goal for a fault-tolerant sys¬ 
tem might be that no single fault can cause 
system failure. Many commercial system 
manufacturers advertise their systems’ 
ability to tolerate some maximum number 
of processor, disk drive, and other compo¬ 
nent failures. However, such advertising 
does not mention the frequency or likeli¬ 
hood of such failures, or their cost. 

Reliability and availability. Dependa¬ 
bility is most often quantified probabilisti¬ 
cally in terms of either reliability or availa¬ 
bility. Reliability, /?(?), is the conditional 
probability that a system can perform its 
designed function at time f, given that it 
was operational at time t = 0. Thus R(t) is 
a function of the fault processes affecting 
the system, and of any mechanisms that 
prevent system failure when a fault occurs. 
Many real-time systems, such as those 
used for aircraft or nuclear power plant 
control, require a high R(f) because a single 
error could be fatal. For long-life unat¬ 
tended systems, such as those used in deep- 
space probes, the probability of multiple 
faults increases dramatically with mission 
time. Automatic repairs must be made with 
spare resources to maintain reliability over 
the life of the mission, although some 
performance degradation may be accept¬ 
able during these repairs. 

Where cost prohibits sufficient fault 
tolerance to ensure continuous error-free 
operation, some amount of downtime for 
repair is inevitable. Availability, A(t), is a 
useful measure for systems subject to fail¬ 
ure and repair; it is defined as the probabil¬ 
ity that a system is operational at time t. 
Availability is often expressed as a steady- 
state value, either as the probability that 
the system is operational at any random 
time, or as a given amount of downtime 
over a specified interval. For example, the 
availability goal for the Bell System elec¬ 
tronic switching system was specified as 
two minutes of downtime per year. 4 Com¬ 
mercial systems, which must be affordable 
as well as dependable, are normally de¬ 
signed for high availability. They use fault- 
tolerant protocols and other operations to 
protect the database from contamination, 
while using redundant processors and 
other resources for diagnosis and repair. 
Some systems can continue operating at a 
degraded level during repair. 

Statistical mean values of system failure 
and repair times are often used in system 
evaluation. However, they can be mislead¬ 
ing, since they are computed over infinite 
time intervals rather than the relatively 
short lifetime of the evaluated system. The 
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two most common parameters are “mean 
time to failure” (MTTF), which is the 
expectation of the time at which the system 
will fail, and “mean time to repair” 
(MTTR), the expectation of the time to 
restore a failed system to correct operation. 
These two parameters are most often used 
to compute steady-state availability, given 
by 

^steady-state = MTTF/MTTF + MTTR (1) 

If a system is highly reliable — that is, if 
MTTF is large relative to MTTR — then 
availability is close to 1. For smaller MTTF 
values, availability varies significantly 
with repair time. Complete derivations of 
the above parameters and other reliability 
and availability measures are discussed 
elsewhere. 1 K.S. Trivedi discusses system 
reliability modeling in this issue of Com¬ 
puter. 5 

Improving reliability with fault toler¬ 
ance. The effects of a fault-tolerant design 
strategy on system reliability can be ex¬ 
pressed as follows: 

R s stem = Frfno fault} + 

Pr {correct operation/fault} * (2) 

Pr) fault} 

The first term is the probability that no 
fault will occur. It is maximized by “fault- 
intolerant” design, that is, by high-quality 
components, proofs of design correctness, 
and other formal design methodologies. If 
Pr {no fault} can be made sufficiently high, 
a target system reliability can be achieved 
without fault-tolerance strategies. 

The effects of fault tolerance on reliabil¬ 
ity are represented by the second term in 
Equation 2, which is the probability that a 
fault will occur but will not result in system 
failure, computed over all possible faults. 
Pr [correct operation/fault}, referred to as 
the coverage of the fault-tolerance mecha¬ 
nism, is the conditional probability that a 
system will continue to operate correctly 
given the occurrence of a particular fault. 
Each coverage term is weighted by the 
probability that the corresponding fault 
will occur, so for a cost-effective system 
design, fault-tolerance mechanisms 
should be targeted at the most likely faults. 
Note that if fault probabilities are high, a 
system may be able to tolerate all of a given 
set of faults and yet not be sufficiently 
reliable for the application. Automatic 
fault-detection, diagnosis, repair, and re¬ 
covery mechanisms can reduce or elimi¬ 
nate downtime, improving availability. 


I I 

Fault tolerance in a 
digital system is achieved 
through redundancy in 
hardware, software, 
information, and/or 
computations. 


A fault-tolerant-system designer must 
also consider performance, complexity, 
cost, size, and other constraints, all of 
which are affected by the redundancy and 
fault-tolerance strategies used. These costs 
must be weighed against such conse¬ 
quences of system failure as lost produc¬ 
tion or danger to life, which may be diffi¬ 
cult to quantify. 

Elements of fault- 
tolerance strategies 

Fault tolerance in a digital system is 
achieved through redundancy in hardware, 
software, information, and/or computa¬ 
tions. Such redundancy can be imple¬ 
mented in static, dynamic, or hybrid con¬ 
figurations. A fault-tolerance strategy in¬ 
cludes one or more of the following ele¬ 
ments: 

• Masking. Dynamic correction of gen¬ 
erated errors. 

• Detection. Detection of an error — a 
symptom of a fault. 

• Containment. Prevention of error 
propagation across defined bounda¬ 
ries. 

• Diagnosis. Identification of the faulty 
module responsible for a detected er- 

• Repair/reconfiguration. Elimination 
or replacement of a faulty component, 
or a mechanism for bypassing it. 

• Recovery. Correction of the system to 
a state acceptable for continued opera- 

For short-term ultrareliable operation, 
where no time is available for off-line fault 
diagnosis and repair, a static or passive 
configuration of elements is designed to 
mask a given maximum number of faults. 


Dynamic redundancy, on the other hand, 
involves the switching of modules or re¬ 
routing of communications as faults occur. 
The faulty components are detected, diag¬ 
nosed, and repaired or replaced. 

In a hybrid approach a static base con¬ 
figuration masks a given number of faults, 
while faulty modules are detected and 
replaced within the configuration. Hybrid 
redundancy is desirable for long-term 
ultrareliable applications in which the 
probability of multiple faults is high. 

High-availability applications do not 
necessarily require continuous error-free 
operation, although databases and other 
critical resources must be protected. In 
such cases, errors are detected and con¬ 
tained within replaceable modules, rather 
than masked. System operation is then 
degraded or halted to perform diagnosis, 
reconfiguration or repair, and recovery. 

Error detection, masking, and correc¬ 
tion. Component complexity affects the 
ability to distinguish errors from correct 
values. Errors occurring in data-storage 
components, such as registers and mem¬ 
ory, or during data transmission via buses 
or network links, are more easily detected 
than errors originating within modules that 
generate or transform data. Masking or 
correcting errors is more difficult, requir¬ 
ing multiple copies of an element or other 
redundancy so that correct data can be 
extracted from the redundant information. 
Error detection and correction can be con¬ 
current with normal system operations or 
executed off line during specified testing 
intervals. 

Error detection and correction codes. 

Coding theory is the most widely devel¬ 
oped mechanism for error detection and 
correction in digital systems, typically 
requiring less redundancy than other error 
detection and correction schemes. A 
code’s error detection and correction prop¬ 
erties are based on its ability to partition a 
set of 2" n-bit words into a code space of 2 m 
words and a noncode space of 2" - 2 m 
words. For most codes, each word 
comprises m bits of information and k = n — 
m check bits. Each code is designed so that 
a given number of errors transforms a code¬ 
space word into a word in the noncode 
space. Errors are detected by decoding 
circuits that identify any word outside the 
code space. Error correction is performed 
by more extensive decoding that uniquely 
associates a noncode-space word with the 
original code word transformed by the 
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Figure 1. Replicated lockstep operation of modules with redundant outputs 
checked in each clock cycle: (a) logic compared externally; (b) logic compared 
on chip. 


Within a single word, the number of 
errors detectable or correctable by a given 
code is related to the minimum separation 
or Hamming distance between the words 
of the code space. The distance is the 
minimum number of bit positions by which 
any two words from the code space differ. 
If two words differ by only one bit posi¬ 
tion, then an error in that bit transforms one 
word into the other. If the minimum dis¬ 
tance is 2, a single error can produce only 
a noncode word, with at least two errors 
required to transform one code word into 
another. If the minimum separation is 3, 
any noncode word produced by a single 
error is distance 1 from the original code 
word and at least 2 from any other code 
word, allowing the original word to be 
uniquely identified. 

Larger separations permit detection and/ 
or correction of greater numbers of errors, 
generally by increasing the size of the 
noncode space (2 m+t ) relative to that of the 
code space (2 m ), making it more likely that 
errors will result in noncode words. The 
cost of this increased coverage is usually a 
lower code efficiency (code bits versus 
total bits) or a more complex encoding 
algorithm. 

Error detection and correction codes 
vary widely in detection and correction 
properties, encoding and decoding com¬ 
plexity, and code efficiency. The most 
common codes include simple parity 
checks to detect errors in buses, memory. 


and registers. Parity-based Hamming 
codes detect and correct errors in memory; 
cyclic redundancy checks and other cyclic 
codes detect and correct errors in commu¬ 
nications channels and disk storage; m- 
out-of-n codes detect errors in micropro¬ 
gram control stores and other ROMs; and 
arithmetic codes detect errors originating 
within arithmetic logic units. 

Many computer memory subsystems 
include single-error correction and 
double-error detection using inexpensive 
Hamming-code-based support chips that 
efficiently encode and decode words dur¬ 
ing memory operations. Other commercial 
very large scale integration components 
include parity generators for buses and 
storage elements, and encoding/decoding 
circuits for disk drives, tapes, networks, 
and other communications channels. Some 
new VLSI components incorporate on- 
chip parity generation and checking logic; 
for example, the Advanced Micro Devices 
Am29300 chip set generates and checks 
parity on data paths to and from the device, 
and on internal data paths. In addition, 
several recent VLSI memories incorporate 
on-chip error detection and correction to 
mask memory cell faults arising in manu¬ 
facturing or normal operation. 

Self-checking logic. Self-checking 
logic designs detect faulty logic circuits, 6 
especially in code checkers and other cir¬ 
cuits that could be single points of failure 


in a system. 4 (Several experimental VLSI 
designs have been implemented entirely 
with self-checking circuits.) Each self¬ 
checking circuit has coded inputs and out¬ 
puts, typically in the form of 2-bit “dual- 
rail” logic, which has two valid code words 
and two noncode words for each logic line. 
A circuit is classified as fault secure if, for 
any specified fault within the circuit, the 
circuit never produces an incorrect output 
code word when stimulated by a correct 
input code word. A self-testing circuit, on 
the other hand, outputs a noncode word for 
at least one code word input for each pos¬ 
sible fault. A totally self-checking circuit 
has properties of both fault-secure and self¬ 
testing circuits; hence, no internal fault can 
convert an erroneous input into a valid 
output, and at least one normally occurring 
input will detect each possible internal 
fault. 

Module replication for error detec¬ 
tion and masking. With circuits that gen¬ 
erate or transform information, complete 
module replication is often the only cost- 
effective approach for error detection and 
correction. Figure 1 shows the most 
straightforward approach to error detec¬ 
tion: The outputs of identical modules 
operating in lockstep are compared. Sev¬ 
eral commercial transaction-processing 
systems have been built around pairs of 
off-the-shelf microprocessors with com¬ 
parator circuits at their bus interfaces to 
detect processor faults (Figure la). 

Simple disagreement detection indi¬ 
cates a fault but cannot identify the faulty 
unit. The system must be interrupted for 
further diagnosis. Continuous operation 
can be attained by using additional error- 
detection mechanisms to make the dupli¬ 
cated modules self-checking, as in the 
AT&T 3A electronic switching system 
processor, which uses self-checking logic 
circuits. 4 Figure 2a shows that when one 
module signals an error, it can be disabled 
while the other module continues to supply 
correct information, effectively masking 
the fault in the failed unit. Normally the 
disagreement detector between modules is 
eliminated and all errors are assumed to be 
detected within the redundant modules. 
Figure 2b shows how self-checking mod¬ 
ules can be built with off-the-shelf compo¬ 
nents: One of the configurations of Figure 
1 is duplicated, so four units and two 
comparators are needed for continuous 
fault masking. This approach has been 
used in the Stratus computer family and 
other systems. 

Continuous operation is often provided 
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Figure 2. Continuous operation with duplex self-checking modules: (a) two self- 
checked modules; (b) four simple modules as two self-checked pairs. 


by using the majority vote of the outputs of 
three or more identical modules, masking 
failures of the minority. Triple modular 
redundancy has been used extensively in 
ultrareliable systems for aerospace and 
industrial applications, with two out of 
three votes masking single-module fail¬ 
ures. Additional fault coverage can be at¬ 
tained with N modules by deploying them 
in a hybrid modular-redundant configura¬ 
tion, in which failed modules are replaced 
within a triple modular-redundant core 
configuration. Hybrid modular-redundant 
configurations can mask failures of all but 
two modules, compared with a simple 
minority in M-out-of-A' majority-voting 
systems. 

A significant problem with module rep¬ 
lication is synchronization of the redun¬ 
dant modules. If comparison or voting is 
done in hardware, tight coupling of the 
redundant modules is needed to ensure that 
comparison or voting takes place on valid 
data samples. Fault-tolerant clocking 
schemes and other means of synchroniza¬ 
tion have been studied extensively, and 
several recent commercial VLSI chips 
include on-chip support for duplex, mas¬ 
ter/checker operation. Figure lb shows 
paired master and checker chips operating 
in lockstep, with all corresponding pins 
connected to the same input/output lines. 
Both chips receive all inputs and perform 
all operations. The output lines are driven 
only by the master, with output also routed 
into the corresponding pins of the checker 
to on-chip comparators for comparison 
with values produced by the checker. The 
result is indicated by a match or an error 
signal. 

An alternative to tight coupling is to 
compare only selected outputs from 
loosely synchronized units. In the SIFT 
system, 7 critical-process outputs are ex¬ 
changed by the redundant processors in 
each process step and compared in subse¬ 
quent process steps by a software voter. In 
the space shuttle, selected data values are 
mathematically combined into “compare 
words,” which are periodically exchanged 
and compared by software in four redun¬ 
dant processors. 8 

Voters and comparators, although typi¬ 
cally much more reliable than the redun¬ 
dant modules they protect, represent po¬ 
tential single-failure points in replicated 
systems. Fault tolerance and reliability can 
be increased by replicating the compara¬ 
tors or voters, usually at the module inputs, 
as in the triple modular-redundant system 
stage of Figure 3. Failure of any single 
voter or the module to which its output is 


connected is masked by the voters at subse¬ 
quent module inputs. Redundancy 
schemes have also been extended to many 
nondigital devices (motors, actuators, 
sensors) used in redundant systems to 
minimize the number of single-failure 
points. 

Protocol and timing checks. The be¬ 
havior of most sequential logic circuits and 
systems can be described by state ma¬ 
chines or other protocols. Protocol vari¬ 
ation resulting from a fault can be detected 
several ways without massive replication 
of modules. 9 Selected process states or 
module outputs can be compared with 


predicted values or other heuristic infor¬ 
mation, generated by alternative algo¬ 
rithms or off-line units. Data values can be 
checked for proper structure or consis¬ 
tency with previous or predicted values. 
Handshaking sequences between elements 
involved in data transfers can be moni¬ 
tored by hardware or software, especially 
over buses and network links. Operational 
“capabilities,” — the activities allowed by 
various processes — can be verified be¬ 
fore allowing an operation on a critical re¬ 
source. Such approaches often reduce 
hardware redundancy requirements but 
may be more difficult to implement, re¬ 
quiring application-specific information 



Figure 3. Triplicated voters and modules forming one triple modular-redundant 
stage of a system, with voting at module inputs. 
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which might, in turn, depend on unpredict¬ 
able system inputs. 

A simple fault-detection mechanism is 
the time-out check. An event failing to take 
place within some predefined time interval 
usually indicates a fault (an event can be a 
single data transfer or an entire process 
step). Such occurrences can be monitored 
by a “watchdog timer” set at the beginning 
of each event to time-out after some time 
T max , interrupt the system, and signal an er¬ 
ror. If the event completes before T max has 
elapsed, the timer is stopped and reset for 
the next event. 

Error correction without massive redun¬ 
dancy is difficult. However, for many tran¬ 
sient faults, simple repetition of an opera¬ 
tion after the fault disappears may produce 
correct results, provided the system state 
can be restored to the beginning of the op¬ 
eration. Many processors support single¬ 
instruction retry, with facilities to detect 
errors and save and restore register values. 
Several microprocessors also support bus- 
cycle retries, which can be performed with 
minimal saving of information. In both 
cases, hard faults are signaled if errors 
persist after some maximum number of 
retries. 

Fault containment. To protect critical 
system resources and minimize recovery 
time, errors must be confined to the mod¬ 
ule in which they originate. Typically, 
error-containment boundaries are hierar¬ 
chically defined, with errors confined at 
the lowest level to single replaceable or 
repairable modules, and additional 
boundaries set around subsystems contain¬ 
ing these modules. Johnson’s excellent 
case study of fault-containment boundary 
definition and support describes the estab¬ 
lishment of containment boundaries 
around buses, processors, and memory 
modules in the former Intel i APX-432 fam¬ 
ily. 10 

Containment boundaries can be estab¬ 
lished in two ways: Each module can check 
its own outputs, or each can check all 
incoming information. The most common 
approach is to require each module to sus¬ 
pect all incoming information and correct 
or block faulty data at the module inter¬ 
face. Voters in software 7 or hardware 11 are 
used in the logical configuration shown in 
Figure 3. 

If a module is to be responsible for its 
own output, it needs an error-containment 
boundary. An error detection or correction 
circuit, such as a voter, a comparator, or a 
code checker, is placed at the interface 
between the module and the system bus or 


communications channel, along with a 
circuit capable of disabling the module’s 
output. If error correction is not possible, a 
faulty module must be isolated to prevent 
error propagation; its process is effectively 
halted. A disadvantage in this configura¬ 
tion is that the module interface often 
cannot protect the system from failures of 
the interface circuits themselves. 

Reconfiguration and repair. A system 
is repaired either by replacing the failed 
module with a spare or by reconfiguring 
the system structure or work load distribu¬ 
tion to circumvent the module. Module 
replacement restores the system to full 
operation but requires redundant modules 
not used for normal operations. 

Many reconfiguration strategies use all 
system components to perform useful 
work. When a fault occurs, system per¬ 
formance is degraded by redistributing the 
work load among the remaining resources. 
Or system redundancy can be reduced, 
affecting subsequent fault tolerance. The 
space shuttle computer complex is an ex¬ 
ample of the latter strategy. It uses four 
processors with majority voting for critical 
operations. 8 Voting continues after one 
failure, but a second failure ends voting 
and a single processor performs all remain¬ 
ing operations. 

A failed module may be physically or 
logically removed from a system. Logical 
removal is accomplished by switching off 
the module’s power, forcing its output into 
an inactive state, or instructing all units to 
ignore or bypass it. 

Replacement units can be either “hot” or 
“cold.” A hot spare concurrently performs 
the same operations as the module it is to 
replace, needing no initialization when it is 
switched into the system. A cold spare is 
either not powered or used for other tasks, 
requiring initialization when switched into 
the system. System designers must weigh 
the cost of unused spares against that of 
initialization time when deciding between 
hot or cold spares. 

If a failed module is not replaced, sys¬ 
tem operation degrades as work is distrib¬ 
uted among remaining resources. In multi¬ 
processors and other parallel processing 
systems, tasks are typically distributed 
across the available processors, so that 
processor loss only reduces system 
throughput. 12 This happens in commercial 
transaction-processing multiprocessors 
advertised to operate continuously in the 
presence of faults. In these systems, all 
critical data is replicated or otherwise 
protected to facilitate transfer of opera¬ 


tions between processors. Special care is 
taken to duplicate global data or provide 
other redundant information to allow cor¬ 
rupted data to be repaired. Global data 
usually resides in shared memory or in 
“mirrored” disk volumes — duplicated 
disk drives and controllers accessible by 
multiple processors. In massively parallel 
machines or cellular arrays with complex 
interconnection architectures, algorithms 
reassign tasks and reroute communications 
to bypass faulty processing cells for grace¬ 
ful degradation of system operation. 13 

System recovery. If an unmasked error 
has propagated through a system or if sys¬ 
tem hardware or software has been recon¬ 
figured, a recovery period is needed to 
correct the system. The elapsed time be¬ 
tween the occurrence and the detection of 
an error determines the amount of damage 
and the length of the recovery period. 

Most system-recovery schemes restore 
system operation to a previous correct state 
or recovery point. A processor is rolled 
back to a recovery point by restoring regis¬ 
ters and memories to the saved state and 
invalidating cache memories, forcing 
cached data to be restored from global 
memory. Global data is typically protected 
through redundant protocols that allow 
updates to be completed or undone and 
repeated following a failure. In shared- 
memory multiprocessor systems, 12 global 
data and lists of tasks to be performed are 
kept in shared memory, allowing proces¬ 
sors to continue automatically with tasks 
on the list as failed processors are disabled. 
This approach also helps balance loads on 
the individual processors. 

In loosely coupled systems, spare pro¬ 
cessors are periodically updated at prede¬ 
fined checkpoints, so that when a spare is 
given control of a task after failure of a 
master processor, processing can continue 
from the most recent checkpoint rather 
than from the beginning of the task. The 
degree of rollback is limited by using 
atomic actions — small, indivisible pro¬ 
cessing steps completed and verified be¬ 
fore global updates and the next action. Re¬ 
covery from a failure occurring before 
saving the results is usually performed by 
repeating the entire action. 


C omputer architectures are changing 
rapidly, with increased integration 
in VLSI devices, new parallel pro¬ 
cessing architectures, and Widely distrib¬ 
uted networks presenting new challenges 
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to fault-tolerant-design engineers. Much 
previous work in fault-tolerant-hardware 
design focused on gate-level approaches, 
but now more work is needed at much 
higher levels of abstraction, making com¬ 
plete design validation more difficult. 
Consequently, new approaches and tools 
must be developed for fault-tolerant de¬ 
sign, simulation, and reliability analysis. 

Large systolic arrays, massively parallel 
architectures, and other large-scale dis¬ 
tributed systems with complex intercon¬ 
nection networks present challenges in 
system control, performance, and fault 
tolerance. Engineers working on commu¬ 
nications structures and algorithms for 
mapping applications onto systolic arrays 
and other cellular parallel systems are also 
developing extensions to detect and diag¬ 
nose faulty cells and circumvent them in 
real time. 

Most of the fundamental concepts dis¬ 
cussed here deal primarily with localized 
rather than system-wide fault tolerance. 
Localized strategies are easy to understand 
and apply. System-level fault tolerance 
requires considerable work, especially in 
wafer-scale systems and other highly inte¬ 
grated systems, which are subject to mul¬ 
tiple component failures. System-level 
fault tolerance is also a challenge in dis¬ 
tributed systems subject to synchroniza¬ 
tion problems and global upset, especially 
in aerospace, military, and other applica¬ 
tions where external disturbances are 
likely. The challenge in commercial appli¬ 
cations is to provide fault tolerance that is 
both dependable and affordable. ■ 
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Fault Tolerance in 
Commercial 
Computers* 


T he ultimate goals of a computer 
system affect its design philoso¬ 
phy and design trade-offs. The cost 
of fault tolerance must be weighed against 
the cost of error or failure. Error costs 
include downtime as well as incorrect com¬ 
putation. Some system goals that affect 
design philosophy are determined by the 
answers to the following questions: 

(1) Is the system to be highly reliable or 
highly available? 

(2) Do all outputs have to be correct, or 
only data committed to long-term storage? 

(3) How familiar must the user be with 
the architecture and software redundancy? 

(4) Is the system dedicated so that at¬ 
tributes of the application can be used to 
simplify fault tolerance techniques? 

(5) Is the system constrained to use 
existing components? 

(6) Even if the design is new, what cost 
and/or performance penalty does it impose 
on the user who does not require fault 
tolerance? 

(7) Is the system stand-alone, or can 
other processors be called upon to assist in 
times of failure? 

This article sets forth a taxonomy of 
fault tolerance in commercial computers. 


Daniel P. Siewiorek 
Carnegie Mellon University 


r- 

Now routine in 
general-purpose 
computing 
as well as in 
critical applications, 
error-handling 
capabilities must be 
balanced carefully 
with overall 
system goals. 


An example of each class in the taxonomy 
is presented, as well as its approach to 
answering the above questions. 


'The material in this article is based in part on the 
Introduction to Part II of The Theory and Practice of 
Reliable System Design by D.P. Siewiorek and R.S. 
Swarz, to be published in 1991 by Digital Pres 


A taxonomy of fault 
tolerance in 
commercial systems 

The taxonomy is composed of three 
orthogonal axes: the sources of errors the 
computer tolerates, the computer’s ap¬ 
proach to tolerating errors, and the 
computer’s structure. 

Error sources. The stages in the prog¬ 
ress of a computer from concept to final 
implementation include specification of 
input/output relationships, logic design, 
prototype debugging, manufacturing, in¬ 
stallation, and field operation. Deviations 
from intended behavior, or errors, can 
occur at any stage as results of incomplete 
specifications, incorrect translation of 
specifications into logic design, and as¬ 
sembly mistakes during prototyping or 
manufacturing. During the system’s op¬ 
erational life, errors can result from 
changes in its physical state or damage to 
hardware. Physical changes may be trig¬ 
gered by environmental factors such as 
fluctuations in temperature or power sup¬ 
ply voltage, static discharge, and even 
alpha particle emissions. Inconsistent 
states can also be caused by operator errors 
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Table 1. Probability of operational outage from various sources. 



AT&T 

Switching 

Systems 

Bellcore 

Commercial 

Japanese 

Commercial 

Users 

Tandem 

1985 

Tandem 

1987 

Northern 

Telecom 

Mainframe 

Users 

Hardware 

0.20 

0.26 

0.75* 

0.18 

0.19 

0.19 

0.45 

Software 

0.15 

0.30 

0.75* 

0.26 

0.43 

0.19 

0.20 

Procedural 






0.333 


error 








Maintenance 



0.75* 

0.25 

0.13 


0.05 

Operations 

0.65 

0.44 

0.11 

0.17 

0.13 

0.33 

0.15 

Environment 



0.13 

0.14 

0.12 

0.28 

0.15 

Power 






0.125 



* The sum of these three sources was reported as 0.75. 


and by design errors in hardware or 
software. 

One way to categorize fault-tolerant 
systems is to identify the class of faults 
each is designed to tolerate—the first axis 
of our taxonomy. Using data gathered from 
several studies, Table 1 illustrates the dis¬ 
tribution of sources of system outages. 1 If 
we consider software problems to be 
caused mainly by design errors, the sources 
of outage are approximately evenly dis¬ 
tributed over the main stages of a system’s 
life: design, hardware manufacturing,* and 
operation (including environment). 

Any source of error can appear at any 
stage; however, it is usually assumed that 
certain sources of error predominate at 
particular stages. Error detection tech¬ 
niques can be tailored to the types of fault 
sources manifested. Thus, at each stage of 
system life a particular technique is 
primary. 

Error-handling techniques. The sec¬ 
ond axis is the approach a system uses to 


handle faults. A system may go through as 
many as 10 stages in response to a failure. 
Designing a system involves the selection 
of a coordinated response that combines 
some or all of these steps: 

• fault confinement—limiting the 
spread of fault effects to one area of the 
system; 

• fault detection; 

• fault masking—hiding the effects of 
failure; 

• retry—a second attempt at an opera¬ 
tion is often successful, particularly in 
the case of a transient fault; 

• diagnosis; 

• reconfiguration—reconfiguring a 
component to isolate it; 

• recovery—backing up system opera¬ 
tion to the point prior to fault detec¬ 
tion; 

• restart; 

• repair; 

• reintegration—placing a repaired 
module back in the system after physi¬ 
cal replacement of a component. 


uniprocessor is replicated, the resultant 
structure is a multicomputer. Each com¬ 
puter has its own operating system and 
communicates with the other computers 
over a high-speed backplane. Any inter¬ 
computer communication goes through 
several copies (at least sender and receiver) 
of the operating system, greatly increasing 
the overhead of cooperation. Multicom¬ 
puters have been popular for fault-tolerant 
systems because of the physical separation 
afforded by separately packaging each 
computer. 

In a multiprocessor structure the proces¬ 
sor is replicated, but memory and I/O 
devices are shared. All processors have 
equal access to the shared resources, and 
programs can cooperate using minimum 
overhead (that is, processors can check 
flags in memory without involving the 
operating system). However, early detec¬ 
tion and minimizing error propagation are 
major concerns in fault-tolerant systems 
that use the multiprocessor architecture. 

Table 2 presents the commercial sys¬ 
tems described in this article and their 
approaches to fault tolerance. For simplic¬ 
ity, the table lists only two stages in the 
response to a failure: detection and recov¬ 
ery. In theory almost all stages of failure 
response can be implemented in either 
hardware or software. In practice the ear¬ 
lier stages of fault handling are performed 
in hardware because of concerns about 


'To simplify discussion, we attribute all hardware fail¬ 
ure to the manufacturing stage even though some ran¬ 
dom failures and failures due to aging hardware may be 
more appropriately assigned to the operations stage. It 
is often difficult to determine whether the origin of a 
failure is a manufacturing defect or operationally in¬ 


Computer structures. Three major 
architectures have been developed to re¬ 
spond to failures. The simplest is a uni¬ 
processor, composed of a processor, 
memory, and input/output devices. Com¬ 
ponents can be replicated to enhance per¬ 
formance and fault tolerance. If the entire 
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Table 2. A taxonomy of fault tolerance techniques in commercial computing systems. 


Structure 

Detection 

Recovery 

Sources of 

Failures Tolerated 

Techniques 

Uniprocessor 





VAX 8600 

Hardware 

Software 

Hardware 

Hardware error 





detection 

IBM 3090 

Hardware 

Hardware/ 

Hardware 

Hardware error 



software 


detection, retry. 





workaround 

Multicomputer 





Tandem 

Hardware/ 

Software 

Hardware, design, 

Checkpointing, 


software 


environment 

“I’m alive” 





messages 

Stratus 

Hardware 

Hardware 

Hardware, 

Duplication 




environment 

and matching 

VAXft 3000 

Hardware 

Hardware 

Hardware, 

Duplication 




environment 

and matching 

Multiprocessor 





Teradata 

Hardware 

Software 

Hardware, 

Duplication 




environment 


Sequoia 

Hardware 

Software 

Hardware, 

Duplication 




environment 

and matching 


fault confinement. The ordering of the 
systems in Table 2 reflects their similari¬ 
ties of techniques and capabilities. 

Uniprocessors 

VAX 8600. RAMP, the reliability, 
availability, and maintainability program 
of the VAX 8600 32-bit computer, is rep¬ 
resentative of contemporary general-pur¬ 
pose computing design. Some RAMP fea¬ 
tures are defined in the system architecture 
and must appear in every implementation. 
Other features are implementation spe¬ 
cific. 

An archetypal VAX implementation 
comprises a CPU connected to memory 
and I/O devices by a backplane bus. Bus 
adapters convert I/O bus protocols to the 
backplane bus protocol. A set of internal 
system registers is associated with each 
subunit (cache, memory, translation 
buffer, backplane, and so forth). Most 
subunits are associated with up to four 


types of registers: configuration/control, 
status, data, and diagnostic/maintenance. 
The configuration/control register con¬ 
tains information on the state of the ele¬ 
ment (checking enabled, reporting en¬ 
abled, and so forth). The status register 
contains flags summarizing the state of the 
element, including error reports. Data reg¬ 
isters capture relevant information about 
the system state when an error is detected 
(for example, the address used on cache 
lookup when a cache parity error is de¬ 
tected). Finally, the diagnostic/mainte¬ 
nance register contains control and status 
information generated by the error detec¬ 
tion and correction logic. 

The console subsystem of a VAX is a 
small computer that provides control (halt, 
restart, initialize, and so on) over the CPU, 
as well as access to internal system regis¬ 
ters. It has a mass storage device contain¬ 
ing the main system bootstrap code and 
some diagnostics. The console subsystem 
also has access to the visibility bus, which 
makes almost 600 internal logic signal 


values visible to the microdiagnostics. 

The main memory in VAX systems is 
protected by error-correcting code (ECC). 
An optional backup battery preserves the 
contents of memory over short-term power 
failures. 

Remote diagnosis. VAX systems have a 
port for remote diagnosis, an integral part 
of the VAX maintenance philosophy and 
of most commercial systems. The follow¬ 
ing is a typical VAX maintenance sce¬ 
nario: Disk-resident user mode diagnos¬ 
tics periodically execute under the VMS 
operating system. The goal of user mode 
diagnostics is to exercise and detect func¬ 
tional errors in memory, bus adapters, 
device controllers, and disk drives. Errors 
reported by user mode diagnostics or hard¬ 
ware check circuits prompt a customer call 
to the diagnostic center (DC). The cus¬ 
tomer replaces the removable disk media 
with diagnostic and scratch disks, turns a 
key on the front of the console to Remote, 
and calls the DC (unauthorized access is 
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not possible). 

The DC engineer calls the customer’s 
processor, logs on to the system, and be¬ 
gins to execute a script of diagnostics. 
Micro- and macrodiagnostics can be 
loaded from the diagnostic disk and exe¬ 
cuted. The error log can be examined, 
memory locations deposited or examined, 
and so on. If the diagnostic disk is not 
operable, the diagnostics can be loaded 
from the console subsystem mass storage 
device or downline-loaded over the phone. 
The DC will attempt to isolate the failure to 
a subsystem. If the CPU is faulty, the diag¬ 
nostics on the console subsystem mass 
storage device are executed to verify the 
CPU status. 

The DC notifies the local field service 
office of the failing subsystem. Upon arri¬ 
val at the customer’s site, a field service 
engineer replaces the faulty board and 
reverifies the system. If the failing subsys¬ 
tem is the CPU, microdiagnostics are 
loaded into the writable control store. 

Remote diagnosis has at least three 
major advantages: (1) faster mean time to 
repair, especially when the problem is triv¬ 
ial and can be resolved over the remote 
link; (2) faster resolution of difficult prob¬ 
lems, because the person at the DC is an 
expert in VAX system fault determination; 
and (3) greater certainty that the repair 
person is sent to the site with the correct 
part in hand. 

VAX 8600 error handling. The VAX 
8600 consists of four parts: I box (instruc¬ 
tion fetch), E box (instruction execute), M 
box (memory interface), and F box (float¬ 
ing point). Since the system is heavily 
pipelined, the box structure provides 
unique opportunities and challenges. For 
example, four copies of the general-pur¬ 
pose registers are available for high-speed 
performance. Thus, if one general register 
set detects a parity error, the error can 
easily be corrected by updating from one 
of the other register sets. On the other 
hand, since several partially executed in¬ 
structions can be in the pipeline at any 
given time, a detected error must be 
mapped to the appropriate instruction, and 
that instruction must be backed up and 
retried without disturbing the other partial 
results. 

Many design decisions were applied 
uniformly across the boxes. The following 
is an overview of their common features: 

• All memories and buses are checked 
by parity. Furthermore, parity continuity is 
carried through all the major data paths. 


VAX 8600 error detection and 

correction features 

E box 

F box 

Parity 

Parity 

Microstore 

Microstore 

Microstack 

General-purpose registers 

General-purpose registers 

Decode memory 

Memory control function 

Other 

memory 

Self-tests during idle 

Buses parity checked 

Correction 

Operand bus 

Microstore 

Write bus 

Console bus 

Correction 

Microstore 

M box 

General-purpose register copy 

Parity 

Instruction retry 

Microstore 

Data path parity checked 

Microstack 

Output parity of multiple ALUs 

Microaddress 

Shifter 

Cache (tag, data, written bits) 

AMUX, active source recorded 

Translation buffer (tag, data, 

BMUX, active source recorded 

valid bits) 

Other 

Cycle condition code 

CPU keep-alive signal 

Buses parity checked 

Error insertion—static and 

Array bus 

dynamic 

Adapter bus 

Other error detection 

1 box 

Microstack overflow/underflow 

Parity 

Force bad parity 

Microstore 

Enable/disable error detection 

General-purpose registers 

and reporting mechanisms 

Instruction buffer 

Multiple errors while servicing 

Decode memory 

first error 

R log 

Correction 

Data path 

ECC on main memory: 

Buses parity checked 

cache over data, address 

Operand bus 

parity, and bad data bit 

Write bus 

Write back corrected data on 

Correction 

error 

Microstore 


Parity is kept not only for data but also for 
physical addresses and microcode. The 
console processor corrects bad data in a 
writable memory, such as a control store or 
table lookup constants, from the files it 
uses during machine initialization. 

• Address parity and a bad data flag are 
“folded” into the ECC for main memory 
and cache. Thus, not only incorrectly ac¬ 
cessed words but also data that was stored 
corrupted can be identified. 

• Instruction retry is used to recover 
from transient and intermittent errors. 

• Errors are dynamically logged and 
analyzed by the Standard Package for Er¬ 
ror Analysis and Reporting (SPEAR). By 
analyzing trends, SPEAR more accurately 


isolates failures to field-replaceable units. 

• A diagnostic bus gives the console 
access to hundreds of internal logic sig¬ 
nals. 

• The environmental monitoring mod¬ 
ule (EMM) determines that all boards are 
in the proper place through electronic 
keying. It also detects the ambient tem¬ 
perature of incoming air and the tempera¬ 
ture gradient across the card cage. An 
overheated regulator, a failed blower, in¬ 
adequate airflow, inadequate output volt¬ 
age, or a dangerously high temperature 
gradient will cause the EMM to shut the 
system down. 

The list above shows the RAMP features 
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IBM 3090 series fault tolerance features 

Reliability 

Low intrinsic failure rate technology 

Extensive component burn-in during manufacture 

Dual-processor controller incorporates switchover 

Dual 3370 direct-access storage units support switchover 

Multiple consoles for monitoring processor activity and for backup 

LSI packaging reduces number of circuit connections 

Internal machine power and temperature monitoring 

Chip sparing in memory replaces defective chips automatically 

Availability 

Two or four central processors 

Automatic error detection and correction in central and expanded storage: 

• Single-bit error correction and double-bit error detection in central storage 

• Double-bit error correction and triple-bit error detection in expanded storage 
Storage deallocation in 4-Kbyte increments under system program control 
Ability to vary channels off line in one-channel increments 

Instruction retry 
Channel command retry 

Error detection and fault isolation circuits improve recovery and serviceability 
Multipath I/O controllers and units 

Data integrity 

Key-controlled storage protection (store and fetch) 

Critical-address storage protection 
Storage error checking and correction 
Processor cache error handling 
Parity and other internal error checking 
Segment protection (S/370 mode) 

Page protection (S/370 mode) 

Clear reset of registers and main storage 

Automatic remote support authorization 

Block multiplexer channel command retry 

Extensive I/O recovery by hardware and control programs 

Serviceability 

Automatic fault isolation (analysis routines) concurrent with operation 
Automatic remote support capability: auto call to IBM if authorized by customer 
Automatic customer engineer and parts dispatching 
Trade facilities 
Error logout recording 

Microcode update distribution via remote support facilities 
Remote service console capability 
Automatic validation tests after repair 
Customer problem analysis facilities 


of the four VAX 8600 boxes. More details 
can be found in an article by Bruckert and 
Josephson. 2 

IBM 3090. IBM's system maintenance 
strategy focuses on failure recovery. Hard¬ 
ware circuits detect errors, and informa¬ 
tion about the current machine state is 
logged for hardware, microcode, and soft¬ 
ware recovery techniques. The informa¬ 
tion is also logged at a remote support 


facility to assist engineering design activ¬ 
ity. Four stages of corrective action are 
used, each with a successively larger im¬ 
pact on users: transparent recovery, one 
user affected, multiple users affected, and 
down. This recovery structure is common 
in systems with the goal of high availabil¬ 
ity or in real-time data-processing environ¬ 
ments in which temporary loss of data is 
tolerable. 

The box above gives an overview of 


IBM 3090 series system features. The 
hardware error detection, error correction, 
and monitoring circuits listed are used in a 
maintenance scenario similar to that de¬ 
scribed for the VAX 8600. Historical per¬ 
spectives on the reliability, availability, 
and serviceability of IBM systems can be 
found in articles by Hsiao et al. and by 
Droulette. 3,4 

Multicomputers 

Tandem. Tandem Computers was 
founded in 1976 to build high-availability 
computer systems for commercial transac¬ 
tion processing. The Tandem NonStop I 
was the first commercially available 
modularly expandable system designed 
specifically for high availability. Design 
objectives for Tandem systems include 

• Nonstop operation: Failures are de¬ 
tected, components are reconfigured out of 
service, and repaired components are con¬ 
figured back into the system without stop¬ 
ping the other system components. 

• Data integrity: No single hardware 
failure can compromise the data integrity 
of the system. 

• Modular system expansion: Process¬ 
ing power, memory, and peripherals are 
added without impacting applications soft- 


Tandem systems eliminate single points 
of failure by means of dual paths to all 
system elements (including disks and I/O 
controllers), processor replication, redun¬ 
dant power supplies, mirrored disks (two 
identical copies of files on two independ¬ 
ent disk drives), and a message-based 
operating system. 

Systems are composed of up to 16 
computers interconnected by two mes¬ 
sage-oriented buses (Dynabus), as de¬ 
picted in Figure 1. System designers chose 
a loosely coupled architecture instead of a 
tightly coupled, shared-memory architec¬ 
ture because they believed the former al¬ 
lows more complete fault containment. 
Upon this hardware structure the software 
builds a process-oriented system with all 
communications handled as messages. 
This abstraction allows the blurring of the 
physical boundaries between processors 
and peripherals. Any I/O device or re¬ 
source in the system can be accessed by a 
process, no matter where the resource and 
process reside. 

The hardware and software modules are 
designed to be “fast-fail”—that is, to ter- 
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Figure 1. Tandem’s system organization. 


minate processing immediately after de¬ 
tection of errors. Techniques employed in 
hardware modules include checksums on 
bus messages, parity on data paths, error- 
correcting-code memory, and watchdog 
timers. Software modules employ consis¬ 
tency checks and defensive programming 
techniques. 

Tandem systems use retry extensively to 
access I/O devices. First, hardware and/or 
firmware retry the access, assuming a 
temporary fault. Next, software retries, 
followed by alternative path retry and fi¬ 
nally alternative device retry. 

Data integrity is maintained through the 
mechanism of I/O process pairs; one I/O 
process is designated as primary, the other 
as backup. All modification messages are 
delivered to the primary I/O process. The 
primary sends a message with checkpoint 
information to the backup so that it can 
take over if the primary’s processor or 
access path to the I/O device fails. Files can 
also be duplicated on physically distinct 
devices controlled by an I/O process pair 
on physically distinct processors. Thus, in 
the event of physical failure or isolation of 
the primary, the backup file is up to date 
and available. 

User applications can also use the pro¬ 
cess pair mechanism. Consider a nonstop 
application program X. X starts a backup 
process Y in another processor. There are 
also duplicate file images, one designated 
primary and the other backup. Program X 
periodically (at user-specific points) sends 
checkpoint information to Y. Y is the same 
program as X but knows that it is a backup 
program. Y reads checkpoint messages to 
update its data area, file status, and pro¬ 
gram counter. 

The checkpoint information is inserted 
in the corresponding memory locations of 
the backup process, as opposed to the more 
usual approach of updating a disk file. This 
approach permits the backup process to 
take over immediately in the event of fail¬ 
ure, without having to perform the usual 
recovery journaling and disk accesses 
before processing resumes. 

Program Y loads and executes if the 
system reports that X’s processor is 
down—that is, if an error message is sent 
from X’s operating system image or if X’s 
processor fails to respond to a periodic 
“I’m alive” message. All file activity by X 
is performed on both the primary and 
backup file copies. When Y starts to exe¬ 
cute from the last checkpoints, it may at¬ 
tempt to repeat I/O operations successfully 
completed by X. The system file handler 
will recognize this and send Y a “success¬ 


fully completed I/O” message. Y periodi¬ 
cally asks the operating system if a backup 
process exists. Since one no longer does, Y 
can request the creation and initialization 
of a copy of both the process and file 
structure. More information on the Tan¬ 
dem operating system and the program¬ 
ming of nonstop applications can be found 
in an article by Bartlett. 5 

Tandem’s Network Systems Manage¬ 
ment Program provides a set of operators 
that help reduce the number of administra¬ 
tive errors typically encountered in com¬ 


plex systems. The maintenance and diag¬ 
nostic system analyzes event logs to call 
out failed field-replaceable units success¬ 
fully 90 percent of the time. 

Available networking software allows 
interconnection of up to 255 geographi¬ 
cally dispersed Tandem systems. Tandem 
applications include order entry, hospital 
records, bank transactions, and library 
transactions. 

The overall Tandem architecture has 
remained constant through the seven pro¬ 
cessor upgrades and one Dynabus perform- 
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Table 3. Evolution of Tandem systems. 


Year 

NonStop I 
1976 

NonStop II 
1981 

TXP 

1983 

VLX 

1986 

CLX 600 
1987 

CLX 700 

1989 

Cyclone 

1989 

MIPS/IPU 

0.7 

0.8 

2.0 

Processor 

3.0 

1.0 

1.5 

10.0 

Instructions 

173 

285 

285 

285 

306 

306 

306 

Technology 

MSI 

MSI STTL 

MSI 

Fast PAL 

ECL 

Gate array 

Custom 2 |x 
CMOS 

Custom 1.5p 
CMOS 

ECL 

Gate array 

Cycle time 

100 ns 

100 ns 

83 ns 

83 ns 

133 ns 

91 ns 

45 ns 

Microstore 


8k x 4B 

Two-level: 
8k x 5B 

4k x 10B 

10k x15B 
dual 

14k x 7B 

14k X7B 

8k x 20B std+ 

8k x 20B pairs 

Cache (data & 
instructions) 



64 KB 
direct map 

64 KB 
direct map 

64 KB 
direct map 

128 KB 
direct map 

2 x 64 KB 
direct map 

Gates (approx.) 

20k 

30k 

58k 

86k 

81k 

81k 

275k 

Proc. boards 

2 

3 

4 

2 

1 

1 

3 

Procs./system 

2-16 

2-16 

2-16 

2-16 

1-6 

2-8 

2-16 

Virtual 

512 KB 

1 GB 

1 GB 

Memory 

1 GB 

1 GB 

1 GB 

2 GB 

Physical 

2 MB 

16 MB 

16 MB 

256 MB 

32 MB 

32 MB 

2 GB 

Per board 

64 KB 

384 KB 

512 KB 

2 MB 

2 MB 

8 MB 

8 MB 

16 MB 

48 MB 

4 MB (on 
processor 
board) 

8 MB 

8 MB(on 
processor 
board) 

8 MB 

32 MB 

64 MB 

Max. boards 

2 

2 

4 

2 

1 

1 

2 

Cycle time 

500 ns/2B 

400 ns/2B 

666 ns/8B 

416 ns/8B 

933 ns/8B 

637 ns/8B 

495 ns/16B 








(+225 ns if CAM miss) 

Interprocessor 
bus speed 

2x13 MB/s 

2X13 MB/s 

Input/output 

2x13 MB/s 2 x 20 MB/s 

2 x 20 MB/s 

2 x 20 MB/s 

2 x 20 MB/s 

Channel speed 

4 MB/s 

5 MB/s 

5 MB/s 

5 MB/s 

3 MB/s 

4.4 MB/s 

2x5 MB/s 


CAM = Content-addressable memory 
B = Byte 

IPU = Instruction processing unit 
PAL = Processor array logic 
ECL = Emitter-coupled logic 
MSI = Medium-scale integration 


ance upgrade shown in Table 3. Bartlett et 
al. and Gray provide further details. 67 

Stratus. Stratus Computers, founded in 
1980, uses an alternative to the Tandem 
approach for on-line transaction process¬ 
ing by employing single-chip micropro¬ 
cessors. The design goal is continuous 
processing, which Stratus defines as unin¬ 
terrupted operation without data loss, per¬ 


formance degradation, or special program¬ 
ming. Stratus systems use continuous 
checking between duplexed components 
for detection of errors at the actual points 
of failure. 

The Stratus self-checking, duplicate- 
and-match architecture is shown in Figure 
2. A module (or computer) is composed of 
replicated power and backplane buses 
(StrataBus) into which a variety of boards 


can be inserted. Up to 32 modules can be 
interconnected to form a system via a 
message-passing Stratus Intermodule Bus 
(SIB). Access to the SIB takes place via 
dual 14-Mbyte-per-second links. Systems, 
in turn, are tied together by an X.25 packet- 
switched network. 

Now consider how the system in Figure 
2 handles failure. The two processor 
boards (each containing a pair of micro- 
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Figure 2. The Stratus pair-and-spare architecture. 


processors) are self-checking modules 
used in a pair-and-spare configuration. 
Each board operates independently. Each 
half of each board (for example, the A half) 
receives inputs from its own bus (bus A) 
and drives its own bus (bus A). Each bus is 
the wired-OR of one half of each board 
(bus A is the wired-OR of all A board 
halves). The boards constantly compare 
their two halves; upon disagreement, a 
board removes itself from service, a main¬ 
tenance interrupt is generated, and a red 
light is turned on. 

The spare pair on the other processor 


board continues processing and is now the 
sole driver of both buses. The operating 
system executes a diagnostic on the failed 
board to determine whether the error was 
due to a transient or permanent fault. In the 
case of a transient fault, the board is re¬ 
turned to service. Permanent faults are 
reported by phone to a customer assistance 
center. The CAC reconfirms the problem, 
selects a replacement board of the same 
revision, prints installation instructions, 
and ships the board by overnight courier. 

The user, learning of the problem only 
when the board is delivered, removes the 


old board and inserts the new board with¬ 
out disrupting the system (this replace¬ 
ment is known as a hot swap). The new 
board interrupts the system, and the pro¬ 
cessor that has been running brings the 
replacement into synchronization, making 
the full configuration available again. 
Detection and recovery are transparent to 
the application software. 

Detection and recovery procedures for 
other system components are similar, al¬ 
though full implementation of the pair- 
and-spare strategy is restricted to the pro¬ 
cessor and memory. The disk controllers 
contain duplicate read/write circuitry. The 
communications controllers are also self¬ 
checking. The memory controllers, in 
addition, monitor the bus for parity errors. 
These controllers can declare a bus broken 
and instruct all boards to stop using it. 
Other boards monitor the bus for data di¬ 
rected to them. If the board detects an 
inconsistency but the memory controllers 
have not declared the bus broken, the board 
assumes that its bus receivers have failed 
and declares itself failed. 

The Stratus hardware approach is attrac¬ 
tive in that it does not require on-line re¬ 
covery from faults. The spare component 
continues processing until its faulty 
counterpart can be replaced. No data errors 
are injected into the system, so no software 
recovery mechanisms are required for the 
pair-and-spare components. Complexities 
due to checkpointing, restart program¬ 
ming, and other software fault tolerance 
considerations are eliminated. In addition 
to being easy to program, the Stratus ap¬ 
proach to maintenance reduces yearly 
service cost to 6 percent of life cycle cost, 
compared to an industrial average of 9 
percent. 

As in all fault-tolerant systems, certain 
combinations of rare events can cause the 
system to fail. For example, multiple fail¬ 
ures affecting the two independent halves 
of a board could cause the module to hang 
as it alternates between buses seeking a 
fault-free path. Furthermore, Stratus uses a 
single system clock. 

Like the Tandem architecture, the Stra¬ 
tus architecture has remained essentially 
constant through the evolution of its sys¬ 
tems, listed in Table 4. Webber provides 
further details about Stratus systems. 8 

VAXft 3000. The VAXft 3000 was in¬ 
troduced in 1990 to serve as a fault-tolerant 
stand-alone processor, as a processor in a 
cluster, or as a front end to high-availabil¬ 
ity VAX cluster systems. Data-capture 
front ends must correctly record real-time 
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Table 4. Evolution of Stratus systems. 


Year 

System 

Significant Features 

1981 

FT200 

2-CPU, 68000-based (2 logical CPUs/board) 

Up to 16 Mbytes of memory 

User and executive CPUs (not symmetric) 

20 slots in main chassis 

1984 

XA400 

4-CPU, 68010-based (4 logical CPUs/board) 
Symmetric multiprocessing 

1984 

XA600 

6-CPU, 68010-based (6 logical CPUs/board) 
Symmetric multiprocessing 

8-Kbyte cache per CPU 

Floating-point assist in hardware 

40 slots in main chassis 

1987 

XA2000 

Model Nos. 
(110-160) 

6-CPU, 68020-based (1 logical CPU/board) 

Up to 96 Mbytes of memory 

64-Kbyte cache per CPU 

Floating-point coprocessor 

Dynamic processor upgrades 

Power failure ride-through 

1988 

XA2000 

Model Nos. 
(50-70) 

Generalized I/O controller 

10-slot chassis 

Fault-tolerant I/O communications bus 


data (such as data generated by transaction 
processing and by manufacturing or labo¬ 
ratory monitoring) and recover in micro¬ 
seconds if a fault occurs. Once the data has 
been captured, it can be handed to high- 
availability clusters. A VAXft 3000 front 
end combined with a traditional VAX clus¬ 
ter back end can be more cost-effective 
than a large-scale system that uses hard¬ 
ware fault tolerance uniformly throughout. 

The design objectives for the VAXft 
3000 were similar to those of the Tandem 
and Stratus systems: no single point of 
failure, on-line repair, self-checking 
checkers, and tolerance of power interrup¬ 
tions. An additional goal was to minimize 
the possibility of an error during repair. 

Figure 3 presents an overview of the 
VAXft 3000. The system is composed of 
two complete computers, identified as 
zone A and zone B, each with its own 
processor, memory, backplane, cabinet, 
and uninterruptible power supply. If a 
physical failure occurs in one zone, that 
zone can be powered down, repaired, and 
resynchronized without physically dis¬ 
turbing the on-line zone. A duplicated 
cross-link carries data and timing signals 
to ensure that the two zones operate syn¬ 
chronously. The cross-links provide the 


communication path between the proces¬ 
sor and I/O devices. Their basic function is 
to make I/O devices in both zones see a 
single processor and to make the proces¬ 
sors believe that all I/O devices reside in a 
single zone. 

The main error-checking method in each 
zone is duplication and matching. The 
processor (CVAX), memory controller, 
cross-link, and fire wall are all duplicated. 
Error detection and correction codes are 
used on nonduplicated elements such as 
the memory and the I/O adapter. Master/ 
slave checking is performed by the mem¬ 
ory controller (the gateway to the proces¬ 
sors and memory) and the fire wall (the 
gateway to the I/O devices). 

The memory controllers compare the 
data from both processors and forward a 
single copy to the memory. Both copies of 
the address bits, the ECC bits, and the 
control signals are also forwarded to the 
memory. The memory module generates 
the ECC bits from the data and compares 
them with the ECC bits received. In addi¬ 
tion, the address and control signals for 
both copies must match for all transitions 
during the memory operation. 

Input/output is performed via message 
packets exchanged between the processor 


and various I/O devices. Packets include 
the error detection code (EDC) normally 
associated with the I/O device (X.25, Eth¬ 
ernet, and others). Packets are formulated 
in a replicated region of the system and 
their data and EDC contents are compared 
by the fire walls before release to the non- 
replicated I/O section. The I/O devices 
store the EDC with the data so that when it 
is retrieved, the replicated fire walls inde¬ 
pendently calculate the EDC bits from the 
data and compare them not only with each 
other but also with the stored EDC bits. 
Thus, the protection given by the repli¬ 
cated operation and by the EDC are over¬ 
lapped so that the data is never left unpro¬ 
tected. As the packets progress through the 
replicated portions of the system, their 
values are recorded in trace memories. 
When the fire walls detect a mismatch, the 
origin of the error is quickly isolated by 
comparing the contents of the two trace 
memories and noting the first point of dis¬ 
agreement. 

Upon detecting an error, the system re¬ 
tries the operation, using alternate paths to 
I/O if necessary. If the error persists or 
exceeds the set error rate threshold, the 
system reports the failures by phone to 
field service. After the repair, the contents 
of the running system are copied into the 
newly repaired system as a background 
activity of the operating system. All inter¬ 
vening write operations are transmitted via 
the cross-links to update locations that 
have changed since the copying occurred. 
After the copying is completed, the operat¬ 
ing system places its state information into 
main memory, forcing a current state 
into the newly resynchronized memory. 
A hardware reset is issued and both 
zones restore state and resume lockstep 
operation. 

Multiprocessors 

Teradata. Teradata was founded in 1979 
to apply microprocessor technology to 
terabyte-size databases. The company’s 
DBC/1012 system is composed of four 
module types. Access module processors 
(AMPs) perform database operations and 
access disk storage units (DSUs). Connec¬ 
tions to the database computer are made 
through interface processors (IFPs) to host 
computers and through communication 
processors (COPs) to networks. The IFP, 
COP, and AMP are composed of single¬ 
board computers containing a computa¬ 
tional processor, an I/O processor, and a 
numerical coprocessor. The processor 
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Figure 3. The VAXft 3000 architecture. 


boards have evolved through three models 
of the Intel 8086 family (8086, 80286, and 
80386). In the 80386 version a memory/ 
cache board contains from 4 to 8 Mbytes of 
ECC-protected main memory. 

The IFP, COP, and AMP are intercon¬ 
nected by the Ynet, a tree-structured net¬ 
work with processor modules as leaves. 
Figure 4 depicts the interconnection of 
eight processors using Ynet node expan¬ 
sion modules. The Ynet can be used to 
connect up to 1,024 processors. It operates 
on a 6-MHz clock, with requests moving 
through one stage per clock period. The 
maximum transfer rate of 6 Mbytes per 
second is limited by the root node in the 
hierarchy. The Ynet is duplicated for relia¬ 
bility. 

Operation of the DBC/1012 in response 
to a database inquiry can be summarized as 
follows: The user interfaces to the system 
on the host processors by issuing standard¬ 
ized Structured Query Language State- 



Figure 4. The Teradata multilevel network. 
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Figure 5. The Sequoia architecture. 


ments, which support a relational database 
model. The SQL statements are forwarded 
to the IFP and COP modules, where they 
are translated into work steps to be broad¬ 
cast to the AMP modules. Messages travel 
synchronously up the Ynet hierarchy and 
are sorted at each stage of the network. To 
ensure equal access to the network, a 
message has an “age” associated with it. 
The lowest age identifier wins. The losers 
in the arbitration have their ages adjusted. 

Distribution of data and broadcasting of 
inquiries are transparent to the user. When 
the message reaches the root node of the 
network, it is broadcast to all leaves. Be¬ 
cause of the Ynet’s ability to broadcast, a 
message can be delivered to a single pro¬ 
cessor, a subset of processors, or all pro¬ 
cessors of a predetermined class. The Ynet 


is circuit switched and waits for a positive 
acknowledgment before proceeding. 
Negative acknowledgments have a lower 
priority, so they win the arbitration. Nega¬ 
tive acknowledgments are broadcast to all 
nodes so that they can back out of their 
operation. The data is arranged in a rela¬ 
tional table in which the rows are distrib¬ 
uted evenly across the disk via a hash func¬ 
tion performed on the primary index. 

The AMPs perform the necessary data 
fetching and extraction. Data responses are 
sent back to the IFP or COP that initiated 
the request. The Ynet allows data sets to be 
merged and sorted when more than one 
AMP responds. The merging and sorting 
process is a simple extension of the arbitra¬ 
tion process, in which the lowest value key 
wins at each stage and the losers arbitrate 


again. Finally, the IFP or COP routes the 
responses back to the host. i 

Data is stored on the DSUs as both pri¬ 
mary and fallback copies. The fallback 
data is distributed to other disks in the 
cluster, so that two disk failures are neces¬ 
sary for a loss of data to occur. The fallback 
copy of the data is selected by table. When 
a failure occurs, a parallel rollback and 
recovery operation reconstructs rows in 
the database from transient journal entries. 
As in most fault-tolerant systems, the re¬ 
covery proceeds in successively more 
severe stages. In the simplest case a single 
transaction is backed out and retried. If an 
AMP, IFP, or Ynet board is lost, the DBC/ 
1012 performs a warm restart composed of 
reconfiguration and the parallel rollback 
and recovery process described above. If 
the host computer is lost (due to either host 
computer failure or loss of the Teradata 
director program in the host), the session 
must be reestablished either when the host 
becomes available or through another host. 

The DBC/1012 performs extensive self¬ 
testing during start-up. Error logs are 
stored as tables that can be analyzed via 
SQL requests. To assist diagnosis, a na¬ 
tional repair center keeps a running history 
of problems. 

Because of its modular structure, a DBC/ 
1012 can be constructed from as little as a 
single AMP, DSU, or IFP and one Ynet 
node with any mix of up to 1,024IFP/COP/ 
AMP modules. 

More than 200 DBC/1012 systems have 
been installed worldwide in such diverse 
industries as communications, banking 
and financial services, aerospace, govern¬ 
ment, insurance, retail sales, and original 
equipment manufacturing. The largest 
system comprises over 400 processors. 
More information on the Teradata machine 
can be found in a paper by Neches. 9 

Sequoia. Sequoia produces a modularly 
expandable fault-tolerant multiprocessor 
for on-line transaction processing. Its 
architecture is built on industrial standards 
such as the Motorola 68000 microproces¬ 
sor, the Multibus for peripherals, and the 
Unix operating system. The system is 
composed of three major element types: 
processor, memory, and I/O. Elements are 
interconnected by a dual, segmented sys¬ 
tem bus (see Figure 5). 

Each processing element is composed of 
two M68000 family microprocessors with 
comparators for detecting errors. Memory 
elements use error-correcting code to de¬ 
tect and correct memory data errors. The 
memory controller in each memory ele- 
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ment is duplicated and checked by com¬ 
parator logic. The I/O elements also em¬ 
ploy duplicated and matched processors. If 
a mismatch is detected by any of the dupli¬ 
cate and matched units in a processor ele¬ 
ment, a memory element, or an I/O ele¬ 
ment, the element electrically isolates it¬ 
self from the rest of the system and raises 
a flag. 

In addition to processor and controller 
duplication, the system employs extensive 
parity checking on the processor caches, 
I/O buffers, and bus data paths. Further¬ 
more, each element involved in a bus trans¬ 
action monitors control signals and flags 
unexpected results. Finally, at runtime the 
operating system executes a set of asser¬ 
tions used to validate the expected results 
in the associated operating system 
function. 

Both the main memory and disks are 
mirrored to recover from failure. Periodi¬ 
cally data in the local cache of the proces¬ 
sor is flushed to main memory so that the 
memory’s image of a process is consistent 
after a flush. The cache flush data is first 
written to the backup memory element and 
then to the primary. If a failure occurs 
during the flush operation, the memory 
element that was not being modified has a 
consistent image of the process state. A 
similar procedure is used between the 
memory elements and the mirrored disks. 

In the case of transient faults the element 
resumes service. Permanent faults or inter- 
mittents beyond a preset threshold can 
cause the element to be logically removed 
from the system and a message to be sent to 
the customer support center. Since the 
operating system maintains a central list of 
tasks waiting for execution, the work load 
is automatically distributed over the avail¬ 
able resources and the system can maintain 
operation in a degraded mode. 

The Sequoia system can be expanded to 
64 processors, 128 memories, and 96 I/O 
elements. An article by Bernstein contains 
additional information. 10 


E ach of the seven commercial sys¬ 
tems briefly introduced here pro¬ 
vides different responses to the 
questions posed at the beginning of the 
article. All the systems focus on high 
availability, correct storage of data, unique 
designs, and configuration into multi¬ 
processor systems. The VAX 8600, the 
IBM 3090, and the VAXft 3000 use gen¬ 
eral-purpose .operating systems, while 
Tandem, Stratus, Teradata, and Sequoia 
systems use operating systems optimized 


for transaction processing. The penalty for 
nonfault-tolerant users ranges from 10 
percent to 300 percent when considering 
processor logic alone. 

Fault-tolerant computing techniques 
first appeared in special-purpose, dedi¬ 
cated systems. As hardware costs de¬ 
creased and critical applications arose 
requiring dependability beyond that pro¬ 
vided by general-purpose commercial 
systems, fault-tolerant architectures were 
developed. The success of these systems in 
applications such as transaction process¬ 
ing, electronic funds transfer, communica¬ 
tions, and process control has been noted 
by the general-purpose computing com¬ 
munity. Likewise, semiconductor manu¬ 
facturers have produced chips, such as 
ECC encoders/decoders and duplication/ 
match circuitry in microprocessors, that 
support the design of fault-tolerant sys¬ 
tems. Thus, fault-tolerant techniques are 
now appearing routinely in general-pur¬ 
pose commercial computing systems. 
Designers now balance cost, performance, 
and dependability. This trend will not only 
continue but accelerate as the cost of unde¬ 
pendability becomes intolerable. ■ 
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B oth experimental and real-life 
safety-related systems have begun 
to use design diversity to tolerate 
software faults. 1 Such systems focus 
strongly on design faults, where the term 
“design” encompasses everything from 
system requirements to realization during 
both initial production and future modifi¬ 
cations. Design faults are a source of 
common-mode failures, which defeat 
fault-tolerance strategies based on strict 
replication (that cope with physical faults) 
and generally have catastrophic conse¬ 
quences. 

Precomputer safety-related systems 
minimized common-mode failures 
through diversified design, that is, two or 
more systems delivering the same service 
through separate designs and realizations. 
A typical example is a hardwired elec¬ 
tronic channel backed by an electro¬ 
mechanic or electropneumatic channel. In 
addition, system architecture was based on 
the federation of equipment, where each 
piece of equipment implemented one or 
more subfunctions of the system rather 
than the entire system. Such partitioning 
confined equipment failures to subfunc¬ 
tions, allowing the system’s global func¬ 
tion to continue, although possibly in a 
degraded mode. 


| ~| 

Systems in which one 
piece of hardware 
supports multiple 
software are subject to 
software failures and 
require architectures 
that tolerate both 
hardware and 
software faults. 


Computer-based safety-related systems 
generally retain the federation approach. 
Each subfunction is implemented by a 
“complete” computer comprising hard¬ 
ware and executive and application soft¬ 
ware. Examples of this approach include 
airplane flight-control systems (such as in 
the Boeing 757/767 airliner) and nuclear- 
plant monitors (such as Merlin-Gerin’s 
Systemede Protection IntegreNumerique). 
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To confine computer failures, a system 
must automatically check execution re¬ 
sults for the errors that could lead to fail¬ 
ure. There are two main approaches to de¬ 
tecting errors caused by design faults: 

(1) Acceptance tests of the results via 
executable assertions. These asser¬ 
tions are generalized, formalized 
versions of likelihood checks used 
in process control. 

(2) Diversified design, so that the re¬ 
sults of two software variants can be 
compared (as in the Airbus A-300 
and A-310 airliners and the Swedish 
railways’ interlocking system). 1 

The federation approach generally re¬ 
quires far more processing elements than 
are needed for computing power alone; for 
instance, the Boeing757/767 flight-control 
system comprises 80 distinct functional 
microprocessors, 300 when we account for 
redundancy. 

We could use computers better in such 
systems if the same hardware supported 
software for several subfunctions. Such an 
approach, called integration, is subject to 
software failures, which are due to design 
faults only. Thus, integration requires soft¬ 
ware-fault tolerance. Moreover, some 
safety-related systems (such as those in the 


July 1990 


39 










Table 1. Main characteristics of the software-fault-tolerance strategies. 


Method 

Error-Processing 

Technique 

Judgment on 
Result 

Acceptability 

Variant-Execution 

Scheme 

Consistency of 
Input Data 

Suspension of 
Service Delivery 
During Error 
Processing 

No. Variants 
to Tolerate 
/ Sequential 

Fafs 

Recovery 

Blocks 

(RB) 

Error detection by 
acceptance tests 
and backward 
recovery 

Absolute, 
with respect 
to specification 

Sequential 

Implicit, from 
backward 
recovery 
principle 

Yes, duration 
necessary for 
executing one 

/+1 

N Self-Checking 

Programming 

(NSCP) 

Error detection 
and result 
switching 







Detection by 
acceptance tests 

Absolute, 
with respect 
to specification 

Parallel 

Explicit, by 

dedicated 

mechanisms 

Yes, duration 
necessary for 
result switching 

/+1 


Detection by 
comparison 

Relative, on 
variant results 

Parallel 

Explicit, by 

dedicated 

mechanisms 

Yes, duration 
necessary for 
result switching 

2(f+l) 

N-Version 

Programming 

(NVP) 

Vote 

Relative, on 
variant results 

Parallel 

Explicit, by 

dedicated 

mechanisms 

No 

/+ 2 


NASA Space Shuttle and the Airbus A-320 
airliner) are moving toward limiting or 
eliminating manual or noncomputer 
backup systems. This is an additional in¬ 
centive for software-fault tolerance, since 
safe system behavior becomes entirely 
dependent on reliable software behavior. 

This article elaborates on previous work 
to present a structured definition of hard¬ 
ware- and software-fault-tolerant architec¬ 
tures. 2 We have tried to be as general as 
possible, dealing with specific classes of 
faults or techniques only when necessary. 
(More specific definitions extending the 
recovery block approach 3 and N-version 
programming 4 have appeared elsewhere.) 
After discussing software-fault-tolerance 
methods, we present a set of hardware- and 
software-fault-tolerant architectures and 
analyze and evaluate three of them. A side- 
bar addresses the cost issues related to soft¬ 
ware-fault tolerance. 

Software-fault- 
tolerance methods 

In a diversified design, the different 
systems produced from a common service 
specification are called variants. A diver¬ 
sified design has at least two variants plus 
a decider, which monitors the results of 
variant execution, given consistent initial 


conditions and inputs. The common speci¬ 
fication must explicitly address the deci¬ 
sion points, that is, it must state when to 
make decisions and what data to base them 
on (the data processed by the decider). 

The best-documented techniques for 
tolerating software design faults are the 
recovery block (RB) approach 5 and In¬ 
version programming (NVP). 6 In the first 
approach, the variants are called alternates 
and the decider is an acceptance test, which 
is applied sequentially to the alternates’ 
results. If the results of the primary alter¬ 
nate do not satisfy the acceptance test, the 
secondary alternate executes. In the sec¬ 
ond approach, the variants are called ver¬ 
sions, and the decider is a vote based on all 
versions’ results. 

We use the term “variant” rather than 
“alternate” or “version” because “alter¬ 
nate” reflects sequential execution, which 
is a feature specific to the recovery block 
approach, and “version” has another mean¬ 
ing: successive versions of a system result¬ 
ing from fault removal or functionality 
evolution. During the life of a diversely 
designed system, several versions of the 
variants will be generated. 

The hardware-fault-tolerant architec¬ 
tures equivalent to RB and NVP are stand¬ 
by sparing and iV-modular redundancy, 
respectively. A third approach to hard- 
ware-fault tolerance, active dynamic re¬ 
dundancy, is very popular (especially 


when based on self-checking components, 
such as in the AT&T Electronic Switching 
System and the Stratus system), but it 
has not been described in the literature as 
a generic technique for software-fault 
tolerance. However, self-checking pro¬ 
gramming has long been defined; 7 a self¬ 
checking program results from adding re¬ 
dundancy to a program so that it can check 
its own dynamic behavior during execu¬ 
tion. A self-checking software component 
consists of either a variant and an accep¬ 
tance test or two variants and a comparison 
algorithm. 

Fault tolerance is provided by the paral¬ 
lel execution of at least two self-checking 
components. At each execution of such a 
system, one component “acts” (that is, it 
delivers service or results to the controlled 
or monitored application), while the other 
components remain “hot” spares. When 
the acting component fails, a spare begins 
delivering service. If a spare fails, the act¬ 
ing component continues delivering ser¬ 
vice. Error processing is thus performed 
through error detection and possible 
switching of results. We call this approach 
N self-checking programming (NSCP). 

It could be argued that NSCP is just a 
parallel recovery block scheme, but the 
latter’s backward recovery strategy pre¬ 
vents it from being reduced to the associa¬ 
tion of alternates together with an accep¬ 
tance test. In NSCP, when a self-checking 
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Table 2. Overheads for tolerance of one software fault. 


Method 

Structural Overhead 


Operational Time Overhead 


Diversified 
Software Layer 

Mechanisms 

Systematic 

Decider Variants Execution 

When Errors Occur 

Recovery 

Blocks 

One variant and 
one acceptance test 

Recovery cache 

Acceptance test 
execution 

Accesses to 
recovery cache 

One variant and 
acceptance test execution 

A Self-Checking 
Programming 

Error detection 
by acceptance 
tests 

One variant and 
two acceptance 

Result switching 

Acceptance test 

Input data consistency 
and variants execution 
synchronization 

Possible result switching 

Error detection 
by comparison 

Three variants 

Comparators and 
result switching 

Comparison 

Input data consistency 
and variants execution 
synchronization 

Possible result switching 

A-Version 

Programming 

Two variants 

Voters 

Vote execution 

Input data consistency 
and variants execution 
synchronization 

Usually negligible 


software component is based on the asso¬ 
ciation of two variants, only one variant 
fulfills the expected functions, while the 
other acts as an extended acceptance test. 
Each self-checking component in NSCP is 
responsible for determining whether a 
delivered result is acceptable, whereas the 
judgment of acceptability in NVP is co¬ 
operative. Also, each acceptance test asso¬ 
ciated with a variant, or each comparison 
algorithm associated with a pair of vari¬ 
ants, can be the same or can be specifically 
derived from a common specification for 
each variant or variant pair. As in N- 
version programming, the components’ 
parallel execution necessitates a mecha¬ 
nism to ensure input consistency. 

Our aim in this article is to classify the 
various approaches to software-fault toler¬ 
ance, not to introduce a new approach. In 
fact, most of the real-life systems men¬ 
tioned in the introduction do not imple¬ 
ment either a recovery block approach or 
A-version programming, but rather are 
based on self-checking software. For in¬ 
stance, the Airbus A-300 and A-310 flight- 
control systems and the Swedish railways’ 
interlocking system are based on the paral¬ 
lel execution of two variants that stop 
operation when a comparison of their re¬ 
sults reveals an error. The Airbus A-320 
flight-control system is based on two self¬ 
checking components, each based in turn 
on the parallel execution of two variants 
whose results are compared. Tolerance of 
a single fault in this system requires four 
variants. (The two self-checking compo¬ 


nents in this last scheme do not deliver 
exactly the same service. Critical func¬ 
tions are preserved when the system 
switches from the acting component to the 
spare, but noncritical functions are per¬ 
formed in a degraded mode.) 

Table 1 summarizes the main character¬ 
istics of the three strategies. In selecting a 
strategy for a given application, pay par¬ 
ticular attention to the method forjudging 
result acceptability and whether service 
delivery is suspended when an error oc¬ 
curs. Table 2 summarizes the main sources 
of structural and operation-time overhead 
for software-fault tolerance. The table does 
not mention overhead imposed by tests 
local to each variant, such as input range 
checking and grossly wrong results, since 
such tests are common to all approaches 
(and are — or should be — present in non¬ 
fault-tolerant software systems, as well). 

Fault classes. We classify faults ac¬ 
cording to their independence and their 
persistence. 


Independence. Faults are either related 
or independent. 6 Related faults result from 
a specification fault common to all vari¬ 
ants or from dependencies in the separate 
designs and implementations. Independent 
faults are simply those that are not related. 
Related faults manifest themselves as 
similar errors and lead to common-mode 
failures, whereas independent faults usu¬ 
ally cause distinct errors and separate fail¬ 
ures. Figure 1 illustrates these definitions. 

Persistence. Faults are classified as 
solid or soft based on their persistence. 
Such a distinction is usual in hardware, 
where a fault’s solidity or softness is im¬ 
portant to fault tolerance. A component 
affected by a solid fault must be made 
passive after the fault is detected, whereas 
a component affected by a soft fault can be 
used after error recovery. In other words, a 
solid fault necessitates error processing 
and fault treatment, while a soft fault re¬ 
quires only error processing. A permanent 
fault is a typical solid fault, and a tempo- 


XT “ XT 
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errors 


Figure 1. Classes of faults, errors, and failures. 
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The cost of software-fault tolerance 


Fault tolerance introduces additional 
costs; we estimate those costs here. 
Since design diversity affects costs dif¬ 
ferently according to the life-cycle 
phases, we start with cost distribution 
among the various life-cycle activities 
for classical, non-fault-tolerant, soft¬ 
ware. Our simplified life-cycle model’ 
(see the first table) groups all activities 
relating to verification and validation 
(V&V) separately. 

Three maintenance categories cover 
the software's entire operational life. 1 
Corrective maintenance concerns fault 
removal and involves design, imple¬ 
mentation, and V&V. Adaptive mainte¬ 
nance adjusts software to environ¬ 
mental changes and also involves 
specification activity. Perfective mainte¬ 
nance improves the software’s function; 
thus, it actually concerns software evo¬ 
lution, and so involves all development 
activities, starting with modified require¬ 
ments. 

The cost breakdowns for the life- 
cycle and maintenance’ do not address 
a specific class of software. However, 
since we are concerned with critical ap¬ 
plications, we must incorporate some 
multiplicative factors that depend on the 
particular activity. 2 The last two col¬ 
umns, which are derived from the data 
in the other columns, give the life-cycle 
cost distribution for development only 
and for development and maintenance. 


Software cost elements for non-fault-tolerant software. 


Life-Cycle Multipliers Cost Distribution 

Activity Cost Breakdown’ for Critical Development Development 

Applications 2 and Maintenance 


Development 

Requirements 3% 

Specification 3% 

Design 5% 

Implementation 7% 

Verification 

and Validation 15% 

Maintenance* 67% 


8% 6% 

8% 7% 

13% 14% 

19% 19% 

52% 54% 


* Of this, 20% is for corrective maintenance, 25% is for adaptive maintenance, and 55% is for perfective 
maintenance.’ 


From this table, it appears that main¬ 
tenance does not significantly affect 
cost distribution over the other life-cycle 
activities (in fact, the discrepancy is 
likely to be lower than indicated). Ac¬ 
cordingly, let’s assume in the following 
example that the figures for develop¬ 
ment only are general and cover the en¬ 
tire life-cycle, since we are concerned 
only with relative costs. 

To determine the cost of fault-tolerant 
software, we must introduce factors to 
account for the overheads associated 
with the decision points and the decid¬ 
ers and to account for the cost reduc¬ 
tion in V&V caused by commonalties 
among variants. These commonalties 
include actual V&V activities, such as 


back-to-back testing, and V&V tools, 
such as test harnesses. We cannot ac¬ 
curately estimate such factors given the 
current state of the art. We can, how¬ 
ever, give reasonable ranges of vari¬ 
ations. 

Consider the following factors: 

• r is the multiplier associated with the 
decision points, with 1 < r< 1.2. 

•s is the multiplier associated with the 
decider, with 1 < s < 1.1 for NVP and 
NSCP when error detection is per¬ 
formed through comparison, and 1 < s 
< 1.3 for RB and NSCP when error de¬ 
tection is performed through accep¬ 
tance tests. This difference reflects the 
differences in the deciders, that is, the 
fact that the deciders are specific when 


rary fault (either transient or intermittent) 
is a typical soft fault. 

Let’s now consider software faults in 
operational programs. Once a program has 
been thoroughly debugged, problems are 
more likely to arise from subtle fault con¬ 
ditions (such as limit conditions, race 
conditions, or strange underlying hard¬ 
ware conditions) than from easily identifi¬ 
able faults. Just a slight change in the 
execution context could keep fault condi¬ 
tions from occurring again, thus keeping 
the software from failing again. Since the 
likelihood of such an error occurring again 
is negligible, we can extend the notion of a 
soft fault to software. 8 

Another important consideration for 
error recovery is the notion of local and 
global variables for the components. Let’s 
call the program between two decision 
points a diversity unit. Generally, error 
recovery requires that the diversity units 
be procedures (so their activation and 
behavior do not depend on any internal 
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state). In other words, all data needed by a 
diversity unit must be global data. The 
data’s global nature can result from the 
nature of the application itself. One ex¬ 
ample is physical-process monitoring 
(such as nuclear-plant protection), where 
tasks begin based on sensor data and do not 
use data from previous processing. The 
data’s global nature can also result from 
transforming local data into global data. 
This incurs overhead and could decrease 
diversity (since the decision-point specifi¬ 
cation must be more precise). A simplified 
example is a filtering function that consti¬ 
tutes a diversity unit. In this example, past 
samples should be part of the global data. 

Although these classifications apply to 
all software-fault-tolerance methods, we 
can alter the general rules somewhat in 
specific, application-dependent cases. For 
example, there is an alternate solution for 
NSCP and NVP when the overhead cannot 
be afforded or when transforming local 
data into global data will decrease diver¬ 


sity too much. This solution involves fault 
treatment, that is, it eliminates failed vari¬ 
ants from further processing. 

Let’s summarize the preceding discus¬ 
sion by adopting the following definitions 
for soft and solid faults: A soft software 
fault has a negligible likelihood of recur¬ 
rence and is recoverable, while a solid 
software fault is recurrent under normal 
operation or cannot be recovered. 

Defining hardware- 
and software-fault- 
tolerant architectures 

Our discussion of architectures that tol¬ 
erate both hardware and software faults 
emphasizes the dependencies among the 
software- and hardware-fault-tolerance 
methods and the effects of solid and soft 
software faults on the architecture defini¬ 
tion. We investigate two levels of fault- 
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Cost of fault-tolerant software versus non-fault-tolerant software. 


Faults 

Fault-Tolerance 

N 

(C^/C^min 

(C^/C^) max 

(C FT IC NFT ) av 

(C FT /NC NFr )av 

Tolerated 

Method 






1 

Recovery blocks 

2 

1.33 

2.17 

1.75 

.88 

1 

N self-checking 
programming 
Acceptance test 

2 

1.33 

2.17 

1.75 

.88 


Comparison 

4 

2.24 

3.77 

3.01 

.75 

1 

N-version 

3 

1.78 

2.71 

2.25 

.75 


programming 






2 

Recovery blocks 

3 

1.78 

2.96 

2.37 

.79 

2 

N self-checking 
programming 







Acceptance test 

3 

1.78 

2.96 

2.37 

.79 


Comparison 

6 

3.71 

5.54 

4.63 

.77 

2 

N-version 

programming 

4 

2.24 

3.77 

3.01 

.75 


they decide by acceptance test and ge¬ 
neric when they decide by comparison 
or vote. 

• u is the proportion of testing per¬ 
formed once for all variants (such as 
provision for test environments and har¬ 
nesses), with 0.2 < u < 0.5. 

• v is the proportion of testing, per¬ 
formed for each variant, that takes ad¬ 
vantage of the existence of several vari¬ 
ants (such as back-to-back testing), 
with 0.3 < v< 0.6. 

• w is the cost-reduction factor for 


testing performed in common for sev¬ 
eral variants, with 0.2 < w< 0.8. 

The following expression then gives 
the cost of fault-tolerant software (C FT ) 
with respect to the cost of non-fault- 
tolerant software (C NFT ): 

C ft /C nft= P fl9 ,+ rap Spe + 

[Nr + (s—1)] (P Des + P,J + 
r{us+ (1-u) N[vw+ (1-v)]} p vsv 

where N is the number of variants, 
and Ps„». Pdss’ P, m p. and p vsv are 


| tolerance: architectures tolerating a single 
fault and architectures tolerating two con¬ 
secutive faults. (We can relate these re¬ 
quirements, respectively, to the classical 
Fail Op/Fail Safe and Fail Op/Fail Op/Fail 
Safe requirements used in the aerospace 
community for hardware-fault tolerance.) 

Due to the article’s scope, our discussion 
is highly abstract. We do not discuss such 
distinguishing features as the overhead 
imposed by intercomponent communica¬ 
tion for synchronization, decision-making, 
data consistency, etc., or the differences in 
memory space for each architecture. 

Implementing design diversity. Of the 

many issues involved in design diversity, 6 
two related issues are especially impor¬ 
tant: the number of variants and the level at 
which fault tolerance is applied. 

Number of variants. Aside from eco¬ 
nomic considerations (see the sidebar), the 
number of variants for a given software- 


fault-tolerance method is directly related 
to the number of faults to be tolerated (see 
Table 2). The soft or solid nature of the 
software faults significantly affects the 
architecture only when it must tolerate 
more than one fault. Also note that an 
architecture tolerating a solid fault can also 
tolerate a (theoretically) infinite sequence 
of soft faults, provided there are no fault 
coincidences. 

The relation between the likelihood of 
such faults and the number of variants is not 
simple. Whether increasing the number of 
variants increases or decreases the number 
of related faults depends on several fac¬ 
tors, some of which affect the others ad¬ 
versely. 6 - 9 However, there is good reason to 
increase the number of variants in NVP: in 
a three-version scheme, two similar errors 
can outvote a good result, while they would 
be detected in a four-version scheme. 

Level of fault-tolerance application. The 
level of application involves two questions: 


the cost distribution percentages for re¬ 
quirements, specification, design, im¬ 
plementation, and V&V, respectively. 

The second table gives the ranges for 
the ratio C FT /C NFr , as well as the aver¬ 
age values and the average values per 
variant. In this table, we do not distin¬ 
guish between RB and NSCP with error 
detection by acceptance test, since our 
abstract cost model is likely to mask 
their differences. 

The second table’s results let us 
quantify the qualitative statement that 
A/-variant software is less costly than N 
times a non-fault-tolerant software. Also 
note that previously published figures 3 
fall within the ranges displayed here; 
that an experiment at the University of 
Newcastle upon Tyne estimated RB's 
overhead for two variants at 60 per¬ 
cent 3 ; and that the Project on Diverse 
Software estimated the cost of NVP for 
three variants at 2.26 times the cost of 
a one-variant program. 3 
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How much should the system be decom¬ 
posed into components to be diversified? 
and Which layers (application software, 
executive, hardware) must be diversified? 

The answer to the first question involves 
a trade-off between two opposing consid¬ 
erations: smaller components allow a bet¬ 
ter mastering of the decision algorithms, 
but larger components aid diversity. In 
addition, the decision points are “non¬ 
diversity” points (and synchronization 
points for NSCP and NVP); as such, they 
must be limited. Decision points are neces¬ 
sary only for interactions with the environ¬ 
ment (sensor data acquisition, delivering 
orders to actuators, operator interaction, 
etc.). However, performance considera¬ 
tions could prompt additional compro¬ 
mises. 

Concerning the second question, the 
methods for tolerating design faults can 
apply to any layer of either the application 
or the executive software. They can also 
apply to the hardware layers. 1 With respect 
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Figure 2. Architectures tolerating a single fault. 


to the computation process, the states of 
distinct variants are different. Thus, in 
NSCP and NVP, when the variants execute 
in parallel on distinct (redundant) hard¬ 
ware, agiven layer’s diversity yields differ¬ 
ent states in its underlying layers, even if 
they are not diversified (except, of course, 
at the decision points). The decision to 
diversify layers underlying the application 
software involves additional considera¬ 
tions, such as determining the influence of 
those portions of the hardware and execu¬ 
tive software specifically designed for the 
application, and determining how much 
confidence to place on experience valida¬ 
tion for off-the-shelf components. 

Structuring principles for architec¬ 
ture definition. Structuring is a prerequi¬ 
site to mastering complexity, especially 
when dealing with fault tolerance. 5 Hard- 
ware-fault-tolerance mechanisms usually 
(and usefully) match the structuring of a 
system into layers. 10 Given performance 
considerations (that is, the time needed to 
recover from an error) and damage created 


by error propagation, it is especially desir¬ 
able that each layer have fault-tolerance 
mechanisms to process errors produced in 
that layer. 

Implementing this principle in hardware 
to deal with software-fault tolerance 
requires that the redundant hardware com¬ 
ponents be in the same state when compu¬ 
tation proceeds without error. Such a con¬ 
dition can be satisfied only if the variants 
execute sequentially, that is, in the RB 
approach. However, the diagnosis of hard¬ 
ware faults could be made possible by 
examining the syndrome provided by the 
deciders of the particular software-fault- 
tolerance method. 

Another useful structuring mechanism 
is the error-confinement area, 10 a notion 
that cannot be considered separately from 
the architectural elements. The particular 
architectural elements we consider are: 

• the hardware and associated executive 
software, which provide the necessary 
services for application software to 
execute (for concision, we call these 


“hardware components”), and 
• the variants of the application soft- 


Considering both hardware and soft¬ 
ware faults helps distinguish hardware and 
software error-confinement areas (HEC As 
and SECAs, respectively). In our discus¬ 
sion, a HECA covers at least one hardware 
component, and a SECA covers at least 
one software variant. Because of our defi¬ 
nition of a hardware component, a HECA 
corresponds to that part of the architecture 
made passive after a solid hardware fault. 
It can thus be interpreted as a line replace¬ 
able unit. 

Architectures tolerating a single fault. 

Three architectures correspond to the three 
software-fault-tolerance methods men¬ 
tioned earlier. Figure 2 illustrates the 
SECA and HECA configurations for each 
method. The intersections of the SECAs 
and HEC As characterize the architectures’ 
software- and hardware-fault-tolerance 
dependencies. 
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Table 3. Synthesis of the properties of the hardware-and-software-fault-tolerant architectures. 


Architecture 

Hardware 

Components/ 

Variants 

Properties in Addition to Nominal Fault Tolerance Fault-Tolerance After a 

Hardware Faults Software Faults Hardware 

HECA Is Made Passive 
Software 

RB/1/1 

2/2 

Low error latency 

- 

Detection provided 
by local diagnosis 

Tolerance of one 
independent fault 

NSCP/1/1 

4/4 

Tolerance of two faults in 
hardware components of the 
same SECA; detection of 
three or four faults in 
hardware components 

Tolerance of two independent 
faults in the same SECA; 
detection of two related faults 
in disjoint SECAs; detection 
of two, three, or four 
independent faults 

Detection 

Detection of 
independent faults 

NSCP/l/l/m 

4/3 

Tolerance of two faults in 
hardware components of the 
same SECA 

— 

Detection 

Detection of 
independent faults 

NVP/1/1 

3/3 

Detection of two or three 
faults 

Detection of two or three 
independent faults 

Detection 

Detection of 
independent faults 

RB/2/1 

3/2 

Low error latency 

- 

Detection provided 
by local diagnosis 

Tolerance of one 
independent fault 

NS CP/2/1 

6/3 

Detection of three to six 
faults in hardware components 

Detection of two or three 
independent faults 

Detection 

Detection of 
independent faults 

NVP/2/1 

4/3 

Detection of three or four 
faults in hardware components; 
tolerance of combinations of 
single fault in hardware 
component and independent 
software fault in nonduplicated 
variant 

Detection of two or three 
independent faults 

Detection 

Detection of 
independent faults 

NVP/2/2 

4/4 

Detection of three or four 
faults in hardware components 

Detection of two related faults; 
tolerance of two independent 
faults; detection of three or 
four independent faults 

Detection 

Detection of 
independent faults 


We identify the architectures via a con¬ 
densed expression of the form: X/i/j/. . ., 
where X is the software-fault-tolerance 
method (RB, NSCP, or NVP), i is the 
number of hardware faults tolerated, and j 
is the number of software faults tolerated. 
We add further labels to this expression 
when necessary. Table 3 summarizes the 
main fault-tolerance properties of the 
architectures discussed here and in the next 
section. 

RBI HI. This architecture duplicates a 
two-variant RB on two hardware compo¬ 
nents. Two variants and their instances of 
the acceptance test constitute two distinct 
SECAs and intersect each HECA. The RB 
method assures that each HECA is soft¬ 
ware-fault tolerant. A variant’s indepen¬ 
dent faults are tolerated, while related 
faults between variants are detected. 


However, related faults between a variant 
and the acceptance test cannot be tolerated 
or detected. 

The hardware components operate in 
hot standby redundancy and always exe¬ 
cute the same variant. Thus, hardware 
faults are detected by a high-coverage, 
concurrent comparison between the accep¬ 
tance test results and the hardware results. 
When a discrepancy is detected during 
execution of the primary alternate or the 
acceptance test, the secondary executes so 
that the fault is tolerated (if the fault is 
soft). If the discrepancy persists (which 
would occur if the fault were solid), the 
failed HECA is identified by running diag¬ 
nostic programs on each HECA. The failed 
HECA is thus made passive and service 
continuity is ensured. 

The architecture remains software-fault 
tolerant after this hardware degradation, 


and subsequent hardware faults are de¬ 
tected by either the acceptance test or peri¬ 
odic execution of the diagnostics. 

NSCP/1/1. The basic NSCP/1/1 archi¬ 
tecture (see Figure 2) comprises 

• four hardware components grouped in 
two pairs in hot standby redundancy, 
each pair forming a HECA; and 

• four variants grouped in two pairs, 
each pair forming a self-checking soft¬ 
ware component, with error detection 
performed by comparison. Each vari¬ 
ant pair also forms a SECA associated 
with a HECA. 

The computational states of the hard¬ 
ware components cannot be directly com¬ 
pared due to the diversification imposed by 
the variants. However, a comparison of 
each variant pair’s results also effectively 
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Figure 3. Architectures tolerating two consecutive faults. 


compares the two hardware components in 
each HECA to check hardware faults 
(including design faults). Thus, a HECA is 
also a self-checking hardware component. 

If the results from a HECA’s variant pair 
differ, irrespective of the type of fault, then 
the results are delivered by the other 
HECA. If the discrepancy occurs repeat¬ 
edly, thus indicating a solid hardware fault, 
then the HECA is made passive. The re¬ 
sulting degraded structure still allows de¬ 
tection of both software and hardware 
faults. 

Besides nominally tolerating an inde¬ 
pendent software fault, the NSCP/1/1 
architecture can also 

• tolerate two simultaneous independent 
faults in a SECA, 

• detect a related fault between two 
variants (each pertaining to one of the 
two disjoint SECAs), and 

• detect three or four simultaneous inde¬ 
pendent software faults. 

The NSCP/1/1 architecture corresponds 
to the architectural principle implemented 
in the Airbus A-320. 1 However, since re¬ 
quiring four variants would be prohibitive 


in some applications, a modified architec¬ 
ture (NSCP/l/l/m) exists based on just 
three variants (see Figure 2). 

To see the major difference in error 
processing between the NSCP/1/1 and 
NSCP/l/l/m architectures, consider a soft¬ 
ware fault in V2. Such a fault would cause 
a discrepancy in both self-checking com¬ 
ponents, implying an associated SECA 
covering all four software components and 
preventing any software-fault tolerance. 
Since this is the only event that can cause 
such an error syndrome (assuming a single 
independent fault), the “correct” result is 
immediately available as the one provided 
by VI or V3. Hence, the NSCP/l/l/m 
architecture has a third SECA associated 
with V2 alone. However, the three addi¬ 
tional fault-tolerance and detection capa¬ 
bilities of the NSCP/1/1 architecture listed 
above are lost. 

NVPI1/1. The NVP/1/1 architecture is a 
direct implementation of the NVP method 
consisting of three hardware components, 
each running a distinct variant. The han¬ 
dling of both hardware faults (including 
design faults) and software faults is per¬ 


formed at the software layer by the decider. 
In addition to tolerating an independent 
fault in a single variant, the architecture 
can detect independent faults in two or 
three variants. 

The problem of discriminating between 
hardware and software faults, so that a 
hardware component is only made passive 
due to a solid fault, demonstrates the de¬ 
pendency between software- and hard¬ 
ware-fault tolerance. Because software 
faults are considered soft, a repeatedly 
disagreeing hardware component could 
easily be treated as a sign of a (solid) 
hardware fault. After the failed hardware 
component is made passive, the decider 
must be reconfigured as a comparator in 
case a hardware or software fault is subse¬ 
quently activated. 

Architectures tolerating two consecu¬ 
tive faults. Tolerance of two faults brings 
the distinction between soft and solid soft¬ 
ware faults into play. If the software faults 
are soft, then the number of variants is the 
same as in architectures that tolerate one 
fault. These architectures are of the type 
X/2/1. If the software faults are solid, then 
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the number of variants must increase be¬ 
cause a failed variant cannot execute fur¬ 
ther. These architectures are of the type 
X/2/2. 

Figure 3 shows architectures that toler¬ 
ate two faults. The first three architectures 
(RB/2/1, NS CP/2/1, and NVP/2/1) tolerate 
two hardware faults and a single software 
fault. Another NVP-based architecture 
(NVP/2/2) deals with solid software faults 
by tolerating two consecutive (solid) faults 
in hardware or software. 

RB/2/1. This architecture comprises 
three hardware components arranged in 
triple modular redundancy. Its ability to 
tolerate software faults is the same as that 
of RB/1/1. When a solid hardware fault is 
detected, the corresponding hardware 
component is made passive, thus degrad¬ 
ing the architecture to a level analogous to 
the RB/1/1 architecture. Accordingly, each 
hardware component must include local 
diagnosis, even if it is basically useless in 
handling the first hardware fault. 

NSCP/2/1. This architecture is a direct 
extension of NSCP/l/l/m. A supplemen¬ 
tary duplex HECA supports a software 
self-checking component made up of two 
variants, resulting in a symmetric distribu¬ 
tion of the three SECAs among the three 
HECAs. Since all the variants are dupli¬ 
cated, hardware faults can be instantly 
diagnosed by comparing the results from 
all hardware components. The architecture 
also detects simultaneous independent 
faults in two or three variants. 

NVP/2/1. The NVP/2/1 architecture adds 
a hardware component to the NVP/1/1 
architecture without introducing another 
variant. To maintain software-fault toler¬ 
ance after a hardware component has been 
made passive, at least two instances of 
each variant must pertain to two distinct 
HECAs. Figure 3 shows only one of 18 
possible configurations. 

Of the two distinct variants associated 
with each HECA, one is active and the 
other is idle. At a given execution step, 
three hardware components execute three 
distinct variants, while the fourth hard¬ 
ware component executes a replica of one 
of the variants (VI in this configuration). 
In addition to tolerating an independent 
software fault, this architecture can detect 
two or three simultaneous independent 
faults. 

Tolerance of an independent fault is 
based on a vote incorporating the knowl¬ 
edge that two variants are identical. The 


unbalanced number of variant executions 
allows use of a double vote decision to 
improve the diagnosis of hardware faults 
(each vote includes the results of the 
nonduplicated variants and only one of the 
results of the duplicated variant). To un¬ 
derstand this scheme, consider what hap¬ 
pens when 

• a hardware fault causes an error in one 
of the hardware components executing 
the duplicated variant (VI), 

• a software fault causes an error in the 
duplicated variant, 

• a hardware fault causes an error in one 
of the hardware components executing 
the nonduplicated variants (V2, V3), 
or a software fault causes an error in 
one of these variants. 

In the first example, the fault is easily 
tolerated and diagnosed, since the three 
results agree on one vote and disagree on 
the second. Hence, the result of the dupli¬ 
cated variant is designated as false. 

In the second example, the decider rec¬ 
ognizes that the two votes are not unani¬ 
mous and designates as false the results 
supplied by the duplicated variant. The 
“correct” result is thus immediately avail¬ 
able as the one provided by the nondupli¬ 
cated variants. 


In the last example, tolerance is immedi¬ 
ate, but the votes do not allow fault diagno¬ 
sis. Since software faults are assumed to be 
nonrecurring, repeated failure of a hard¬ 
ware component leads to a diagnosis of a 
hardware fault. However, another form of 
diagnosis lets us relax this assumption: 
when a localized fault (attributable to one 
SECA or one HECA) occurs, the next 
execution step is performed after the active 
variants are reconfigured to match the 
duplicated variant with the affected 
HECA. The decider must then solve 
for one of the first two examples. A 
systematic rotation of the duplicated vari¬ 
ants would also contribute to such a 
diagnosis. 

After a failed hardware component is 
made passive, the active variants are re¬ 
configured to distribute the SECAs among 
the remaining HECAs, forming disjoint 
areas. Figure 4 shows the distribution of 
active and idle variants among the three re¬ 
maining HECAs after any of the HECAs 
have been made passive. In each case, the 
reconfiguration affects only a single 
HECA. Further decisions in this architec¬ 
ture are made by a vote among the active 
variants on the remaining HECAs. Thus, 
the degraded architecture is the same as the 
NVP/1/1 architecture. 
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Table 4. Probability of failure: P s x = P S DX + P S L 


Probability of Detected Failure: P SD 

Probability of Undetected Failure: P s u x 

RB/1/1 Separate: ( P . RB ) 2 

Common-mode: P RVDRB 

Common-mode: P, DJtB + P 2 v,rb 

NSCP/1/1 Separate: 4 (P IJXSCP ) 2 [1 - P IJI/S cp 

, Common-mode: P 2VNSCP + 4 P 3V NSC p 

+ (OWp) 2 / 4 )] 

+ P iV,NSCP + P RVDJ1SCP 

Common-mode: P, DNSCP + 4 P 2 vj>/scp 

NVP/1/1 Separate: W, NVP f [l-(2/3)P ;v 

vp ] Common-mode: 3 P 2 v.nvp + P 3V,nvp 

Common-mode: P IDNVP 

+ P 

1 RVD.NVP 


Table 5. Comparison of analytical and experimental results. 


P INV p 

P 2VWP 

P W NVP 

Ps NVP 
(Table 4) 

P SNVp 

(Experimental) 

2.91 x lO' 3 

4.48 x 10 6 

1.09 X lO' 6 

3.90 x 10 6 

3.67 x 10- 6 


NVPI2I2. To understand the effect of 
solid software faults on architectures that 
tolerate two faults, consider the NVP 
method. Such an architecture requires four 
disjoint HECAs and SECAs, hence the 
NVP/2/2 architecture. 

This architecture might seem to be a 
direct extension of NVP/1/1, adding only 
one HECA and an associated SECA, but 
there are major differences in error pro¬ 
cessing. The fault-tolerance decision is 
now based on finding a single set of two or 
more agreeing results among the four vari¬ 
ant results provided. Also, after the first 
discrepancy is discovered, the designated 
hardware component and its associated 
variant are made passive without any at¬ 
tempt to diagnose the fault as a hardware or 
software fault. Further decisions are then 


made by vote among the remaining vari¬ 
ants, making the degraded architecture 
similar to the NVP/1/1 architecture. How¬ 
ever, unlike the other N VP-based architec¬ 
tures, subsequent faults are treated the 
same as the first detected fault. 

Besides tolerance to two consecutive 
independent software faults, this architec¬ 
ture lets the system tolerate two simultane¬ 
ous independent faults, detect related 
faults between two variants, and detect 
simultaneous faults in three or four vari¬ 
ants. The Fault-Tolerant Processor/ 
Attached Processor 4 is an implementation 
of this architecture: a quad configuration 
of the core fault-tolerant processor sup¬ 
ports the execution of four different pieces 
of application software on four distinct ap¬ 
plication processors. 



Analyzing and 

evaluating 

architectures 


In discussing how to conduct a dependa¬ 
bility analysis of hardware- and software- 
fault-tolerant architectures when adopting 
a Markov approach, we consider three 
architectures that tolerate a single hard¬ 
ware or software fault: RB/1/1, NSCP/1/1 
and NVP/1/1. 

Analyzing software-fault-tolerant 
architectures. Our analysis emphasizes 
the distinctions among the different 
sources of failures — independent and 
related faults in the variants and the de¬ 
cider — and assumes that ohly one type of 
fault can cause errors during each execu¬ 
tion. Also, we do not address the underly¬ 
ing fault-tolerance mechanisms, that is, 
establishment and restoration of recovery 
points for RB, and version synchroniza¬ 
tion, establishment of cross-check points, 
and the decision mechanisms for NSCP 
and NVP. 

We classify failures as separate or com¬ 
mon-mode and as detected or undetected. 
Separate failures result from independent 
faults in the variants, whereas common¬ 
mode failures can result either from related 
faults or from independent faults in the 
decider. We also distinguish between two 
types of related faults: those among the 
variants and those between the variants 
and the decider. We consider a failure 
detected when the decider identifies no 
acceptable result and no output result is 
delivered. A failure is undetected when 
erroneous results are delivered. 

We also assume that the probability of a 
fault is identical for all variants of a given 
architecture. We make this assumption to 
simplify the notation; it does not alter the 
significance of the results (it is simple to 
deduce the generalization to the case where 
variant characteristics are distinguished). 

To characterize the probabilities of fail¬ 
ure, we introduce the following notation 
for the X/l/l architectures: 

• P is the probability of activating an 
independent fault in one variant of X 
on execution 

• P IDX is the probability of activating an 
independent fault in the decider of X 
on execution 

• P nVX is the probability of activating a 
related fault among n variants of X on 
execution 

• P RVDX is the probability of activating a 
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Table 6. Specific state and transition definitions. 


States and 
Interstate 
Transitions 

RB/1/1 

NSCP/1/1 

NVP/1/1 

1 

2 (RB + hardware component) 
operational 

2(2(variant + hardware component)) 
operational 

3(variant + hardware component) 
operational 

2 

(RB + hardware component) 
operational 

2(variant + hardware component) 
operational 

2(variant + hardware component) 
operational 

3 

Detected failure 

Detected failure 

Detected failure 

4 

Undetected failure 

Undetected failure 

Undetected failure 

I to 2 

Covered hardware component failure: 

2 c K*b 

Hardware component failure: ^X HNSCp 

Hardware component failure: 3X H ^ vp 

1 to 3 

Noncovered hardware component 
failure or detected RB failure: 

2 c\ HJtB + X SDRB 

Detected NSCP failure: X SDNSCR 

Detected NVP failure: X SDNVP 

I to 4 

Undetected RB failure: X s URB 

Undetected NSCP failure: X SUNSCp 

Undetected NVP failure: X SUNVP 

2 to 3 

Covered hardware component failure 
or detected RB failure: cX HJ(g + X SDRg 

Hardware component failure or detected 
two-variant failure: 2X HNSCR + X SD2V 

Hardware component failure or detected 
two-variant failure: 2X HNVp + X SD2V 

2 to 4 

Noncovered hardware component 
failure or undetected RB failure: 
cX HRB + X SURB 

Undetected two-variant failure: X S[J2V 

Undetected two-variant failure: X s u 2V 


related fault among the variants and 
the decider of X on execution 

• P is the probability of a detected 
failure of X on execution 

• P s u x is the probability of an unde¬ 
tected failure of X on execution 

• P sj>x + P suj( = P sx < the probability of 
a failure of X on execution 

Table 4 summarizes the probabilities of 
failure and separates them into the sepa¬ 
rate/common-mode and detected/unde¬ 
tected categories. The table shows that 
either separate failures or common-mode 
failures can be detected, while only com¬ 
mon-mode failures can go undetected. 
Comparing the probabilities is difficult 
due to the different parameter values for 
each architecture. However, some general 
observations are possible. 

Although a large number of experiments 
have analyzed NVP, no quantitative study 
has reported the decider’s reliability. Still, 
the probabilities of failure associated with 
the deciders can differ significantly. Due 
to the generic character and functional 
simplicity of the NSCP (comparison) and 
NVP (voting) deciders, the probabilities 
are likely ranked as follows: 

P ,D»SCP^ P ,D.NVr« P ,DMX ^ 

P RVD.NSCP - P RVD.NVP << P RVDJtB 


For separate failures, the influence of 
independent faults differs significantly. 
For RB, the probability of separate failure 
equals the square of the probability that 
independent faults will occur, while the 
probability for NVP is almost three times 
as much, and four times as much for NSCP. 
This difference results from the number of 
variants and the type of decision. How¬ 
ever, it does not mean that RB and NSCP 
are the only architectures that allow detec¬ 
tion of related-fault combinations among 
the variants. All related faults in NVP re¬ 
sult in undetected failures. (This is due to 
our analysis’ limitation to architectures 
that tolerate only one fault. Increasing the 
number of versions would allow NVP 
methods to detect some related faults.) 

Although related faults among variants 
do not affect the probability of undetected 
failure for the RB architecture, they are the 
major contributor to undetected failure for 
NSCP and NVP. However, comparing the 
respective probabilities of undetected fail¬ 
ure is not simple. 

Experiments on multiversion software 
help us estimate some elementary proba¬ 
bilities of the expressions in Table 4. For 
example. Table 5 shows that the results 
obtained from the model agree with previ¬ 
ous experimental results. 9 The first three 


columns show the values derived for the 
model parameters from the experimental 
results. The fourth column gives the asso¬ 
ciated probability of failure, computed 
from the expressions of Table 4 using these 
parameter values and excluding decider 
fault parameters. The last column shows 
the experimental statistic for the probabil¬ 
ity of failure. 

Evaluating hardware and software 
architectures. In modeling the behavior of 
the architectures, we assume 

• only one type of fault can cause an 
error (either hardware or software) 
during each execution; 

• the variant is not discarded after error 
detection and recovery, but is given 
the new input data at the next step (that 
is, software faults are soft); and 

• the hardware components and the soft¬ 
ware-fault-tolerant architectures have 
constant failure rates. 

Models. The generic model in Figure 5 
describes the hardware and software be¬ 
havior for the RB/1/1, NSCP/1/1, and 
NVP/1/1 architectures. Table 6 gives the 
specific definitions for states and interstate 
transitions of the model. The transition 
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Table 7. Time-dependent reliability and probability of undetected failure. 


Method 


Reliability 

Probability of Undetected Failure 

RB 

Rr 

B (f) = 1 - (2 cX HM + X SRB )t 

P UMB it) = \ URB t 

NSCP 

r n 

SCP (t) = 1 - >- S NSCp t 

P UMcrW m huj*scp t 

NVP 

r n 

v/') = 1-W 

/ W'> “ \vwrf 


rates in Table 6 are based on the following 
notation: 

• c is the hardware coverage factor of the 
RB/1/1 architecture, where c = 1-c. 

• X HX denotes the failure rate for a hard¬ 
ware component of theX/1/1 architec- 

• \ s D x and ’K sux denote the detected and 
undetected failure rates, respectively, 
for the fault-tolerant software X. If y 
denotes the application software’s 
execution rate, then we can express 
these failure rates as functions of the 
failure probabilities in Table 4: X s D = 
[P SDX ]ytmdX SUJ( =[P sux ]y 

•X SD2V ant * \ u2v denote the application 
software detected and undetected fail¬ 
ure rates, respectively, of the NSCP/ 
1/1 and NVP/1/1 architectures after an 
HECA has been made passive. These 
rates are defined as [P s D lv ]y and 
[P s v 2V ]y, respectively, where the 
probabilities of detected and unde¬ 
tected failure in the degraded two- 
version configuration are defined as 

P s.o.2 V = 2 P i.2v P - (^,V 2 )) + P ID.2V 
and P S V 2V = P RVD 2V 

In RB/1/1, a hardware failure does not 
alter the architecture’s software-fault tol¬ 
erance, and a software failure does not 
alter its hardware-fault tolerance. We as¬ 
sume we can achieve near-perfect detec¬ 
tion coverage for hardware faults, since 
both HECAs run the same variant simulta¬ 
neously. Thus, the coverage considered 
here for the hardware-fault-tolerance 
mechanisms corresponds to local cover¬ 
age, due to the diagnostic program and the 
acceptance test’s capacity to identify hard¬ 
ware failures. 

In NSCP/1/1, hardware- and software- 
fault-tolerance techniques are not inde¬ 
pendent, since the HECAs and the SECAs 
match. After a hardware component fails, 
the corresponding HECA and SECA are 
discarded. The resulting architecture com¬ 
prises a pair of hardware components and 
a two-version software architecture, form¬ 


ing a self-checking hardware and software 
architecture. 

In NVP/1/1, hardware- and software- 
fault-tolerance techniques are again not 
independent. After a hardware unit has 
been made passive, the remaining archi¬ 
tecture is analogous to the degraded NSCP/ 
1/1 architecture. 

In both NSCP/1/1 and NVP/1/1, hard¬ 
ware faults are tolerated at the software 
layer through the decision algorithm (com¬ 
parison or vote). Accordingly, only the 
software level accounts for the associated 
coverage, which the decider incorporates 
in the probability of a fault occurring. In the 
degraded NSCP/1/1 and NVP/1/1 archi¬ 
tectures, the software is no longer fault- 
tolerant, so the variants’ failure rates are 
important to the failure rate of the applica¬ 
tion software’s degraded configuration. 

Model processing. Combining the pro¬ 
cessing model in Figure 5 with the transi¬ 
tion rates in Table 6 lets us derive the time- 
dependent probabilities of detected and 
undetected failure: P DX (t) and P ux (t), re¬ 
spectively, where t denotes time. In prac¬ 
tice, we are interested mostly in the prob¬ 
ability of undetected failure and in the 
reliability: R x (t) = 1 - [P D /t)+P UJC (f)]. We 
can simplify these expressions for short 
missions (with respect to the mean times to 
failure). The simplified, approximate ex¬ 
pressions (see Table 7) show that RB/1/1 ’s 
reliability depends strongly on the cover¬ 
age of fault diagnosis in the hardware 
components. Furthermore, the hardware 
failure rate is likely greater for RB/1/1 than 
for the other architectures, due to the extra 
memory needed to store the second vari¬ 
ant. Also, to ensure near-perfect detection 
coverage, further hardware or software 
resources are needed to compare the re¬ 
sults from each hardware processor, and 
storage is needed for the acceptance test 
and the diagnostic program. 

The expressions also reveal that the 
application software’s failure rate has an 
identical influence on the three architec¬ 


tures, although this is tempered by the 
differences between the associated proba¬ 
bilities (identified in the section on analyz¬ 
ing software-fault-tolerant architectures). 


T he emergence of hardware-fault- 
tolerant commercial systems will 
increase the user’s perception of 
the influence of design faults, due to these 
systems’ tolerance of physical faults. 
Consequently, software-fault-tolerance 
schemes that use design diversity to give 
system users continuous service (as op¬ 
posed to current implementations that 
preserve system core integrity through the 
termination of erroneous tasks 8 ) are likely 
to spread from their current domain: 
safety-related systems. Accordingly, the 
approaches and results in this article are 
likely to apply more widely. ■ 
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Reliability Estimation of 
Fault-Tolerant Systems: 
Tools and Techniques 


Robert Geist, Clemson University 
Kishor Trivedi, Duke University 


T he recent rapid growth in demand 
for highly reliable computing 
power has focused attention on 
tools and techniques we might use to accu¬ 
rately estimate the reliability of a proposed 
computing system on the basis of models 
derived from the design of that system. Re¬ 
liability modeling of fault-tolerant com¬ 
puting systems has become an integral part 
of the system design process, especially 
for those systems with life-critical applica¬ 
tions such as aircraft and spacecraft flight 
control. 

Reliability modeling has also become an 
important arena in which to view the clas¬ 
sic struggle between model accuracy, that 
is, the extent to which a model of a system 
faithfully represents the system under 
study, and model tractability, that is, the 
extent to which the modeler can extract 
useful information from the model in a 
cost-effective manner. 

Within this arena, certain additional 
complexity constraints that typically ren¬ 
der the classical modeling tools inadequate 
compound the difficulty in searching for 
solutions to this trade-off problem. One 
constraint is the huge disparity in state 
transition rates. A rate ratio (largest 
rate:smallest rate) of 10 10 within a single 
model is not uncommon, yielding “stiff’ 
systems of differential, integral, or alge- 


Comparatively 
evaluating state-of- 
the-art tools and 
techniques helps us 
estimate the reliability 
of fault-tolerant 
computing systems. 
We consider design 
limitations, efficiency, 
and accuracy. 


braic equations, for which standard nu¬ 
merical techniques are largely inadequate. 
Great progress has been made in recent 
years on numerical techniques for solving 
stiff systems, 1 but a ratio of 10'° coupled 
with a system of size 10 5 still appears to be 
out of reach. 

Our purpose here is to comparatively 
evaluate state-of-the-art tools and tech¬ 
niques for estimating the reliability of 

0018-9162/90/0700-0052$01.00 © 1990 IEEE 


fault-tolerant computing systems. Our 
goal is to consider these tools and tech¬ 
niques from both ends of the struggle de¬ 
scribed. In particular, we will look closely 
at design limitations imposed by underly¬ 
ing model assumptions, on the one hand, 
and at the efficiency and accuracy of solu¬ 
tion techniques employed, on the other 
hand. 

Background 

Recall that if X is a random variable that 
denotes the lifetime or time-to-failure of a 
computing component or system and X has 
distribution function 

F^t)=P(X<t) (1) 

then the reliability of the component or 
system R x (r) is the probability that the 
system survives until time t, that is, 

R x (t)=P(X>t)=l-F x (t) (2) 

If R x (f) is differentiable, then the hazard 
rate or failure rate of the component or 
system is given by 

h(t)=-R'x(t)IRx(t) = f K ' d ' O) 
1 
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Figure 1. Three-component reliability model. 


which we can loosely interpret as the con¬ 
ditional rate of failure in the next instant 
given survival until t. In this case 

FjAt)= (4) 

and hence a constant failure rate implies an 
exponential lifetime distribution. 

High reliability for computing systems 
generally has resulted from some form of 
redundancy. Thus, elementary reliability 
models of fault-tolerant computing sys¬ 
tems are often variations on the so-called 
NMR model (/V-modular redundant) in 
which we assume that the system is com¬ 
posed of n identical and independent 
components, m or more of which must be 
functioning for the system to be opera¬ 
tional. Under these simplifying assump¬ 
tions, we can express the reliability of the 
system as 

jwo = S(7) Roya-mr 4 

i=m (5) 

where R(t) denotes component reliability. 
Special cases include the serial system (m 
= n), the parallel system (m = 1), and the 
triple modular redundant voting system (n 
= 3, m = 2). 

A second elementary reliability model, 
the standby redundancy model, features n— 
1 of n identical components maintained in 
a failure-free (powered-off) state. Upon 
failure of the single active component, one 
of the n- 1 powered-off components is 
switched into operation with no cost asso¬ 
ciated to the switching. The system life¬ 
time random variable in this case is the sum 
of n identical component lifetime random 
variables, so 

RmR.it) = 1 - Jo (dF) (n) (6) 

where dF { " } denotes n-fold convolution. 2 
As an example, if all units have a constant 
component failure rate, X, and hence expo¬ 
nential lifetime distribution, then the sys¬ 


tem lifetime will have an n-stage Erlang 
distribution, so 

RmRiO^'LQfe^ ( 7 ) 

i=0 *■ 

An identifiable first generation of relia¬ 
bility modeling tools was largely based on 
such models. 3 Mathur and Avizienis pro¬ 
posed a combinatorial model in which the 
system of interest was represented as a 
series of subsystems, each a hybrid of the 
NMR and NSR (A-modular standby redun¬ 
dant) models discussed. Carter et al. ex¬ 
tended this combinatorial model to allow 
for imperfect coverage of the reconfigura¬ 
tion (switch-in) mechanism. The Com¬ 
puter Aided Reliability Estimation 
(CARE) package was developed at the 
NASA Jet Propulsion Laboratory to auto¬ 
mate reliability evaluation of systems 
modeled by this combinatorial approach. 
Raytheon later modified the package and 
named it CARE II. 

Demands for increased accuracy in re¬ 
liability estimation quickly forced the 
development of more elaborate system 
models without the vastly simplifying 
independence assumptions. A second gen¬ 
eration of estimation tools thus arose based 
on Markovian methods, thereby allowing 
an important first-order dependence. 3 

A discrete-state, continuous-time 
Markov model is a collection of states 
together with nonnegative transition rates 
among these states. Let a., denote the rate 
of transition from state i to state j. The 
Markov model then represents a time- 
evolving system that changes state accord¬ 
ing to these rules: 

(1) Holding time in each state i is expo¬ 
nentially distributed with mean XfL^a... 

(2) Upon exit from state i, state j is 
chosen next with probability a.JL.^a... 

For example, consider the system relia¬ 
bility model of Figure 1. The system be¬ 


gins in state 3, with three operational 
components, each of which fails at rate X. 
Upon failure, a recovery procedure (rate 8 
from state 2A) commences. Recovery 
(switch-out of the faulty component) is 
successful unless we have a single-point 
failure (rate or) or another, near-coincident 
fault (rate 2A,). After successful recovery 
from a first failure, a second failure again 
initiates a recovery procedure that com¬ 
petes with single-point failure and near¬ 
coincident fault. There are no repairs, so 
even if the system successfully reconfig¬ 
ures twice, we eventually have redundancy 
failure (RF). 

Such Markov models are equivalent to 
special linear systems of ordinary differen¬ 
tial equations. If P.(t) denotes the probabil¬ 
ity that the system is in state i at time t, P(t) 
= (F,(0..... P„(0)y and A is an n x n matrix 
whose off-diagonal entries are the transi¬ 
tion rates a and diagonal entries a = 
-X. .a.., then the Markov model is equiv¬ 
alent to the homogeneous linear system 

P\t)=P(t)A (8) 

The first of the tools with a Markovian 
framework was Aries (automated reliabil¬ 
ity estimation system), developed by Avi¬ 
zienis et al. This framework added an 
enormous flexibility to the model specifi¬ 
cation process, and all major reliability 
estimation tools in use today can still trace 
their ancestry to the Aries effort. 

Still, some notable restrictions applied. 
All failure rates were assumed constant, 
and fault coverage was modeled by simply 
attaching constant coverage probabilities 
to the failure rates in the state transition 
rate matrix. Further, the Aries model solu¬ 
tion technique required distinct eigenval¬ 
ues for the transition rate matrix, although 
the ease of programming with this tech¬ 
nique certainly outweighed difficulties 
caused by this restriction in the reliability 
modeling environment. The earlier hybrid 
NMR-NSR models can easily be seen as 
special cases of the general Aries model. 

Another important effort was Surf (sys- 
teme d’evaluation de la surete de fonction- 
nement), a second-generation tool Laprie 
et al. developed. This was the first recogni¬ 
tion of the potential seriousness of the 
constant failure rate assumption, an as¬ 
sumption whose merits are still hotly de¬ 
bated. The Surf designers removed this 
restriction from their tool by using the 
method of stages, 2 which allows one to 
approximate a generally distributed ran¬ 
dom variable by replacing it with a series- 
parallel cascade of exponential stages. 
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Although this method of stages can easily 
lead to an enlarged state-space, the signifi¬ 
cance of the Surf contribution is substan¬ 
tial. Representation of time dependence of 
failure rates (as well as time dependence of 
detection, reconfiguration, and repair 
rates) remains one of the most important 
issues in reliability modeling today. 

A final second-generation tool of note 
was CAST, a Complementary Analytic 
Simulative Technique developed by Conn 
et al. to gauge the fault tolerance of 
NASA’s Shuttle-Orbiter Data Processing 
Subsystem. Although the functional ca¬ 
pacity of the tool per se is subsumed by 
Aries, the CAST effort must be recognized 
for the wealth of seminal ideas represented 
in its development. Among these were the 
special handling of transient and leaky 
transient faults (transients mistaken for 
permanent), the first detailed computation 
of coverage, and the suggested use of 
coordinated simulations in an otherwise 
analytic model. All of these ideas were 
later developed in great detail and are now 
included in several current tools. 

The third Computer Aided Reliability 
Evaluation package, CARE III, stands 
alone as a third-generation tool. 4 It was 
developed by Stiffler et al. in response to 


limitations of the previous approaches that 
became evident with their attempted use 
in modeling ultrareliable (reliability > 1 
-10~ 9 ) fault-tolerant systems such as SIFT 
(software-implemented fault tolerance) 
and FTMP (fault-tolerant multiproces¬ 
sor). 5 The designers recognized that the 
fault/error-handling processes of detec¬ 
tion, isolation, and recovery themselves 
required detailed modeling if they wished 
to accurately estimate the probability of 
coverage failure, now the primary cause of 
system failure. Further, the assumption 
that all random variables of interest were 
exponentially distributed, first questioned 
by Surf’s developers, they removed 
through an incorporation of both local and 
global time dependence for transition 
rates. Such time dependence alleviates the 
need for a state space expansion, at the 
expense of some model solution difficulty. 

The CARE III designers addressed both 
the state-space size and numerical stiffness 
issues through an analytic approximation 
technique later termed behavioral decom¬ 
position? Specifically, they recognized 
that the state transitions in large models of 
ultrareliable systems naturally fell into two 
groups: relatively fast transitions that char¬ 
acterize fault/error-handling mechanisms. 


and relatively slow transitions that charac¬ 
terize fault-occurrence mechanisms. 
CARE III models thus contain separate 
fault/error-handling and fault-occurrence 
submodels. The fault-occurrence sub¬ 
model is an arbitrary nonhomogeneous 
Markov chain, and the fault/error-handling 
submodel is a fixed-template semi- 
Markov model with selectable rates. The 
semi-Markov fault/error-handling model 
is solved in isolation for exit-path proba¬ 
bilities and holding-time functions that are 
then incorporated into the nonhomogene¬ 
ous Markov fault-occurrence model 
through a numerical approximation. 

The major innovations of the CARE III 
approach, behavioral decomposition and 
detailed representation of fault/error-han¬ 
dling, are both its contributions and the 
source of its limitations. The semi-Markov 
fault/error-handling template adds the 
potential for significant detail in accurate 
modeling of detection, isolation, and re¬ 
covery but, in practice, direct specification 
of semi-Markov transition rates comes no 
closer to user knowledge and ability than 
direct specification of a constant coverage 
probability. Further, the effects of the 
approximations in the solution technique 
are not bounded, and thus we don’t know 
the accuracy of the final reliability esti¬ 
mate. Still, the CARE III contributions to 
the state of the art, both as a tool and as a 
catalyst for further development, are ex¬ 
tremely significant. 

Current tools and 
techniques 

We now offer a detailed comparison of 
five current tools and techniques, at least 
one of which has been described by others 
as a fourth-generation reliability predic¬ 
tor. 4 Although these tools and the tech¬ 
niques upon which they are based differ 
markedly from one another, each can be 
seen as motivated by, if not directly de¬ 
rived from, the earlier generations of tools 
described so far. 

HARP. Our goals in the design of the 
Hybrid Automated Reliability Predictor 
were largely based on perceived limita¬ 
tions of CARE III. We wanted to retain the 
benefits of reduced state-space size and 
reduced numerical stiffness afforded by 
behavioral decomposition. We also 
wanted to 

• improve flexibility in specification of 
the fault-occurrence behavior. 
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• improve flexibility in specification of 
the fault/error-handling and recovery 
processes, 

• provide a provably conservative esti¬ 
mate of mission reliability, 

• provide a convenient facility for sensi¬ 
tivity analysis, and 

• reduce execution time. 

We begin with the fault/error-handling 
and recovery processes. Under the assump¬ 
tion of behavioral decomposition, all tran¬ 
sition rates internal to fault/error-handling 
and recovery fall within a few orders of 
magnitude of one another, so we need not 
be concerned with numerical stiffness or 
rare events in the solution of these sub¬ 
models. Further, the information that the 
fault/error-handling and recovery sub¬ 
model must supply to the overall system 
model is considerably less than that made 
available by a fully specified semi-Markov 
process. The information required from 
the submodel is simply the probability of 
reaching each exit, X, in an amount of time 
<t from entry, P x (t)- These probabilities 
can be determined in isolation, without 
considering the effects of a second (near¬ 
coincident) fault. Thus ^xetxiis P x ( ' + °°' ) ~ * ■ 

The HARP tool allows three such exits, 
denoted C, R, and S, from each fault/error¬ 
handling submodel. The exit C represents 
the reconfiguration of the system to toler¬ 
ate a permanent or leaky transient fault 
(called coverage), exit R represents correct 
recovery from a transient fault (transient 
restoration), and exit S represents inability 
to recover from the fault that caused entry 
to the submodel (single-point failure). 

Since the information required of the 
fault/error-handling submodel is minimal, 
a variety of automated modeling tech¬ 
niques can be used to supply it, including 
simulation. The simulation framework 
included with HARP is the extended sto¬ 
chastic Petri net. 6 An ESPN is a directed 
bipartite graph whose two vertex sets are 
called places and transitions. Places 
contain tokens, and transitions have firing¬ 
time distributions. We use circles for 
places, bars for transitions, and small discs 
for tokens in representing these graphs. 
Two types of arcs, standard and inhibitor, 
connect places to transitions and transi¬ 
tions to places. Standard arcs are repre¬ 
sented with normal arrows on the heads 
and inhibitor arcs with small circles on the 
heads. 

The dynamics of ESPNs are rules for 
simulation. Should all standard arcs into a 
transition emanate from places containing 
one or more tokens, and all inhibitor arcs 


into this transition emanate from places 
containing zero tokens, the transition is 
enabled. Once the transition is enabled, an 
amount of time determined by a random 
sample from the associated distribution 
elapses. If the transition is still enabled at 
the end of this time, the transition/ire.v, that 
is, removes one token from each standard 
input place and adds one token to each 
output place. An arc from a transition to a 
place can also be probabilistic, that is, a 
bifurcated arc that sends the output token 
along one branch with a specified 
probability p and along the other branch 
with probability 1 —p. 

The ESPN structure allows convenient 
representation of concurrent behavior such 
as that in which failure processes are active 
concurrently with recovery processes. 
Consider the ESPN transient fault/error¬ 
handling model shown in Figure 2. Fault 
occurrence is represented by placement of 
a token in the TF place. The transition 
below then fires, giving rise to two concur¬ 
rent processes, one representing a latent 
fault (token in place LF) and one represent¬ 
ing system response (token in place SR). If 
the fault causes an error (transition tE 
fires), a single-point failure (token in place 
S) with probability p and error detection 
and isolation with probability 1 -p results. 
Detection could precede error generation 
(tD fires before tE). Upon fault detection 
(token in place D) a rollback procedure 
commences. At its completion (tR fires) 
either the transient has disappeared (token 
in place TD) and we obtain transient recov¬ 
ery (token in place R) or the transient is still 
present (no token in place TD) and we 
reconfigure (token in place C). 

Large classes of such nets can be trans¬ 
formed into Markov or semi-Markov pro¬ 
cesses and solved analytically for the 
probabilities of reaching each of the exits 
and the distributions of times to exit. 6 We 
have chosen to allow even more general 
nonsemi-Markov models and hence imple¬ 
ment only the simulative solution of ESPN 
fault/error-handling submodels in HARP. 

Other options are available for specifi¬ 
cation of the fault/error-handling and re¬ 
covery submodels, including direct input 
of empirical distributions. 

The HARP fault-occurrence submodel 
is a Markov model with possibly time- 
dependent rates of failure. A state is a 
vector whose coordinates indicate the 
number of operational components of each 
type and whether or not the system is 
operational. Component fault events cause 
transitions among the states. Complete 
specification of the states and transitions in 


large models can be arduous and error- 
prone, so we have provided an option for 
system specification using fault trees. 
HARP converts the fault-tree input to a 
Markov model. 

The fault/error-handling submodel 
probabilities serve to construct failure rate 
modifiers, which extend the notion of 
coverage probability. If T x denotes time 
(from fault/error-handling submodel en¬ 
try) to reach submodel exit X (X = C,SJi), 
then we have the distribution of T x : 


(9) 


To actually reach exit X, we must do so 
prior to the occurrence of the next interfer¬ 
ing fault. Let Y ; (v) denote the collective 
rate of interfering faults at global time v, 
given that the system is in state i of the 
fault-occurrence model. Then for X = 
C,S,R, we define the state-dependent and 
possibly globally time-dependent rate 
modifier 


*/(v) =/>*(+“) 


*Prob(T x < time to next interfering fault) 

= Px(+~)Jr c-io V^^dFrp) 


( 10 ) 

Note that if y. is constant, this becomes 

x i (v)=P x (+°°)L T fi i ) (11) 

where L Tx denotes Laplace-Stieltjes trans¬ 
form of T x . 

The fault-occurrence and fault/error¬ 
handling submodels then combine to cre¬ 
ate an overall system model with the form 
of a nonhomogeneous Markov process. If 
P.(t) is the probability that the system is in 
state i at time t and the row vector P(t) = (P a 
(t ),.... P„(t)), then the overall system model 
is given by the nonhomogeneous linear 
system of differential equations 

P\t)=P(t)A(t) (12) 

In general, the entries of the transition 
rate matrix take the form a .(r) = X..(t)x..(t), 
i * j, where X. (t) is the rate of fault occur¬ 
rence in state i which, if covered, would 
cause transition to state j, and x.j(t) is a 
modifier derived from a fault/error-han¬ 
dling submodel. We solve the system in 
Equation 12 numerically using a variation 
of the Runge-Kutta procedure, 7 from 
which we obtain reliability R(t) = 
I feo emlional Pft). Should the matrix A(t) be 
stiff, HARP automatically switches to a 
special stiff solver, TR-BDF2. 1 
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Figure 3. Behavioral decomposition: (a) coverage submodel and (b) instantaneous coverage fault model. 


Table 1. Probability of coverage 
failure x 10 6 . 


Time Exact HARP 


0.0 0.00000000 0.00000000 

2.0 0.59916075 0.59946027 

4.0 1.19850077 1.19880035 

6.0 1.79772058 1.79802022 

8.0 2.39682015 2.39711985 

10.0 2.99579945 2.99609921 


This reliability estimate provides a fast 
approximation to the value obtained by 
numerically solving the “full model,” in 
which a complete fault/error-handling and 
recovery submodel, rather than just a rate 
modifier, is attached to each failure event 
transition. In practice, we have found this 
“instantaneous coverage” approximation 
to be extremely accurate. Moreover, for 
constant rate failure events the technique 
yields provably conservative estimates, 
that is, it is guaranteed to underestimate the 
actual reliability of the system as repre¬ 
sented by the full model. 7 

As an example, consider again the full 
model of Figure 1. A coverage model suit¬ 
able for both of the first two component 
failures appears in Figure 3a. From Equa¬ 
tion 11 we have for i = 1,2 

Q+l./ = Lt c (^) = 

o + o r o + O (A. + 5 + o 


s.+ 1,. = “S Lj s {i ^ = “S i 5+<T x 

0 + 6 5 0 + 6 ;A, + o + 8 

(14) 


and an overall instantaneous coverage 
model shown in Figure 3b. In Table 1, we 
compare the exact probability of coverage 
failure, that is, the probability of being in 
states SPF (single-point failure) or NCF 
(near-coincident failure) in Figure 1, with 
the HARP instantaneous coverage ap¬ 
proximation for X = 0.0001, o = 1.0, 8 = 
1000.0, and mission time t = 10 hours. 

Sensitivity analysis is available in 
HARP through a bounds analysis that is 
significantly faster but less accurate than 
the instantaneous coverage procedure de¬ 
tailed above. 7 

Criticisms of HARP include inadequate 
solution speed for large (25,000 state) 
models and a failure to automatically de¬ 
tect transition rate disparities, forcing the 
user to specify the points of the behavioral 
decomposition. A more serious criticism 
involves the instantaneous coverage ap¬ 
proach to reliability estimation. In life- 
critical applications associated with flight 
control, it is extremely valuable to have 
provably conservative estimates of relia¬ 
bility. We have recently shown that the 
conservativeness of reliability estimates 
obtained through instantaneous coverage 
does not extend to the general time-de- 
pendent failure rate case, but this required 
unrealistic models in which component 
failures benefited system reliability. The 
case for realistic models remains an impor¬ 
tant open question. 

SURE. Butler et al. 8 developed the 
Semi-Markov Unreliability Range Estima¬ 
tor (SURE) based on approximation theo¬ 
rems. Although the approach differs mark¬ 
edly from that used in HARP, the goal is 
the same: fast and accurate reliability esti¬ 
mates for systems modeled by Markovian 
processes with extremely large state spaces 


and rare failure events. The SURE ap¬ 
proach to handling these large, stiff sys¬ 
tems can be characterized as tolerance, as 
opposed to the HARP approach of avoid¬ 
ance. 

The developers assumed failure transi¬ 
tions occur at constant rates. SURE then 
bounds unreliability, that is, the lifetime 
distribution, by enumerating the paths to 
system failure and separately bounding the 
probability of traversing each path prior to 
the specified mission time. 

The bounding procedure reorders the 
sequence of states and associated exit tran¬ 
sitions encountered on the path to failure. 
The reordered path contains three identifi¬ 
able subpaths. Subpath 1 contains all on- 
path failure transitions that compete only 
with off-path failure transitions. Subpath 2 
contains on-path recovery or reconfigura¬ 
tion transitions that compete with off-path 
failure transitions. Subpath 3 contains on- 
path failure transitions that compete with 
both off-path recovery transitions and off- 
path failure transitions. 

Although SURE provides both upper 
and lower bounds on the probability of 
path traversal, we concentrate here on the 
upper bound since it is the one that ulti¬ 
mately yields the conservative estimate of 
system reliability. Let the three subpaths 
of failure path p be denoted p r p 2 , and p r 
and let step j on path p be denoted p ... If 
P(p,T) is the probability of traversing path 
p in time less than or equal to T, then 

P(p,T)<P(puT)xP(p 3 ,+°°) (15) 

— p(puP) x rr Pipy *+°°) 

Now, if step j of path p 3 contains on-path 
failure at rate a , off-path failure at rate p., 
and off-path recovery or reconfiguration 
with distribution F Rj , then 
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Figure 4. Subpaths for Semi-Markov Unreliability Range Estimator analysis. 


P(P3j,+°°) 

= Jr«/' W+ (! -pRft))dt (16) 

< <Xj E[Rj] 

Thus P(p,T) <P(p t ,D x n a E[R]. 

Although these bounds might seem 
crude, in practice they are not. For ex¬ 
ample, consider the path to coverage fail¬ 
ure in the model ofFigure 1 given by 3,2A, 
2, 1A, NCF. In Figure 4, we show each of 
the subpaths. In this case, P{p v T) is given 
by the standard combinatorial expression, 
1 - (3e~ 2W ’ - 2e“ 3 ' J ) and, since recovery on 
Subpath 3 is exponentially distributed with 
mean 1/8, we have 

P(p,T) < [1 - (3e ~ AT - 2e )] M 1/5) 
(17) 

If we do the same for all four of the paths 
to coverage failure and add, we obtain an 
estimate for the probability of coverage 
failure at 10 hours of 2.9990989018 x 
10“ 6 . Compare this with the last row of 
Table 1. 

Although the conservative estimates of 
reliability produced by SURE are gener¬ 
ally not as accurate as those produced by 
HARP, the SURE approach offers several 
advantages. Almost always faster than 
HARP, it generally produces estimates 
accurate to several significant digits, 
which is often sufficient for design pur¬ 
poses. Further, SURE uses a procedure 
more elaborate than that detailed above to 
derive upper bounds on system reliability, 
hence the upper bounds are always at least 
as good as those produced by HARP. Thus, 
it more effectively addresses design cost 
issues. 

The most serious criticism of SURE 
arises from the semi-Markov assumption, 
which permanently precludes any global 
time dependence. Many researchers feel 


that component aging and mission envi¬ 
ronment variability will necessarily give 
rise to time-dependent failure rates over 
the course of the mission, and that failure 
to account for such could lead to unaccept¬ 
able errors. 

Heiress. Simulation-based models usu¬ 
ally offer greater flexibility and detailed 
representation than analytic models and, 
as a result, more realistic predictions. We 
have seen that simulation-based submod¬ 
els, first suggested by the designers of 
CAST, now play a major role in modeling 
fault/error-handling and recovery in 
HARP. 

Nevertheless, full model simulation has 
largely been discounted in reliability 
modeling on the basis of a simple statisti¬ 
cal argument. Suppose we wish to estimate 
the probability p that a system reaches a 
model exit state (failure) before some time 
t by S(n)/n, where S(n) is the total number 
of times we reach the specified state in n 
simulation (or testing) trials of duration t. 
Then 

P(S(n)<k)='Z ( n i ) p‘ (1 -pr 

=/>(*> 1) 

i=o '■ 

where X is a k + 1 stage Erlang random 
variable with parameter np. 2 If we require 
at least 95 percent confidence that we will 
not underestimate p by more than 10 per¬ 
cent, then we must have P(S(n)ln < 0.9 *p) 
< 0.05, that is, P(X>1) < 0.05 where X is a 
To.9 *np 1 + 1 stage Erlang random vari¬ 
able. If X is Erlang, then 2 npX is a chi- 
square random variable with 2( T0.9 *np 1 
+1) degrees of freedom. Thus, we require 
P(2npX > 2 np) < 0.05, that is, 

«>Z.05 2 /2p (18) 


where % 0J 2 is the high 0.05 percentile of a 
chi-square distribution with 2( T 0.9 *np 1 
+ 1) degrees of freedom. The problem is 
just this: a realistic value for p may be 
10“ 9 . If so, Equation 18 indicates that more 
than one trillion trials might be necessary, 
which would represent an unreasonable 
computation time. 

However, recent advances in the appli¬ 
cation of variance reduction techniques 
indicate that we should not dismiss simula¬ 
tion too quickly in attacking this problem. 
The use of a statistical variance reduction 
technique called importance sampling 
could reinstate simulation as a viable alter¬ 
native for reliability estimation, even in 
situations requiring predictions of ultra- 
high reliability. 

The fundamental idea behind the appli¬ 
cation of importance sampling to the rare- 
event Markov model 9 is to force and bias 
transitions along the rare-event paths while 
dynamically maintaining a record of the 
forcing and biasing that allows postsimu¬ 
lation construction of an unbiased estima¬ 
tor of the event of interest (such as cover¬ 
age failure or system failure) with ex¬ 
tremely low variance. 

Here is a brief algorithmic description. 
Let k’) denote density of holding time 
in state k' given entry at Let q(k\k') 
denote the probability that the next state is 
k. For each state k’, letL(U) = { k\transition 
k’ =$ k is at low rate}; let H(k') = 
{ k\transition k' => k is at high rate}; Let 

f(t\t',k')= (19) 

J,'/(tlt',*') 

where T is mission time. For k e L(k'), let 


<20) 
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Figure 5. Semi-Markov middle-level 
model. 


and for k e H(k') let 

q m ')= v ^ a - 4*']) 

X/«=/*(*') <?(/!*) (21) 

where x[k'] e [0,1] is a bias factor. Then a 
trial is 


measure = 0; 
weight = 1; 
state = START; 
time = 0; 

while (not failure state and not end of 
mission) { 

generate next time (to state change) 
using^); 

generate next state using / q()\ 
multiply weight by {f*q)/(f*q), 
evaluated at next state and time 
if(failure state) add weight to 
measure; 

) 


After M trials, the value X" , measure /M 
is an unbiased estimator of the probability 
of interest. 

Heiress (hierarchical estimation of in¬ 
terval reliability by skewed sampling) is a 
tool designed to provide ultrahigh reliabil¬ 
ity estimates using this technique. 10 We 
have used Heiress to estimate the probabil¬ 
ity of coverage failure at 10 hours for our 
model of Figure 1. Execution of one mil¬ 
lion trials required 32 minutes on a Sun-3/ 
50 workstation and yielded a point esti¬ 
mate of 2.99580953 x HP 6 with a 95 per¬ 
cent confidence interval of [2.99569429, 
2.99592476] x HP 6 . 

Preliminary comparisons thus indicate 
that importance sampling will emerge as a 
viable alternative to analytic techniques. 


The advantages of simulation techniques 
in reliability estimation include 

• arbitrary precision with appropriate 
time commitment if approximate ana¬ 
lytic techniques do not provide ade¬ 
quate bounds, 

• a baseline against which new analytic 
techniques can be compared, 

• additional flexibility in model specifi¬ 
cation, and 

• a convenient technique for handling 
uncertainty in model parameter speci¬ 
fication: parameters may have distri¬ 
bution attached, and the sampling may 
include these distributions. 

Nevertheless, several extensions of the 
importance sampling technique will be 
required for Heiress to become a truly 
useful reliability estimation package. First, 
we must develop an algorithm for auto¬ 
mated selection of the bias parameters used 
in this technique. We have observed that 
small changes in these parameters can ef¬ 
fect substantial changes in simulation per¬ 
formance. 

Consider the natural approach of assign¬ 
ing the value 0.5 to the bias parameters. If 
we repeat our Heiress analysis of the model 
of Figure 1 using bias parameters 0.5, we 
find that one million trials require 52 
minutes of execution time to yield an esti¬ 
mate of coverage failure probability of 
2.99494469 x 10 6 and a 95 percent confi¬ 
dence interval of [2.9890843521, 
3.0008050430] x 1(P 6 . Thus, execution 
time has increased 60 percent and the width 
of the confidence interval has increased 
5,000 percent over the values obtained 
earlier. 

To date, bias parameter selection has 
been on an ad hoc, trial-and-error basis. A 
preliminary version of a new algorithm for 
this task 10 suggested unequal bias parame¬ 
ters near 1.0 for the model of Figure 1. 
With these values, we obtained narrower 
confidence intervals and reduced execu¬ 
tion time. 

We must also develop a facility for 
model specification in languages that re¬ 
liability engineers can use, for example, 
ESPN 6 and the reliability block diagram, 5 
rather than the Markov process. 

Finally, testing on full-scale models (of 
size > 10 5 states) must be carried out. It 
might still be too costly to attempt simula¬ 
tion of full reliability models; relegation of 
this technique to fault/error-handling sub¬ 
models incorporating some rare events 
(such as near-coincident faults) might 
prove to be its most effective use. 


SHARPE. Using model hierarchy helps 
avoid both largeness and stiffness, as seen 
in the description of HARP, where a two- 
level hierarchy is used. We would like to 
extend the ability to hierarchically com¬ 
bine models to a greater depth and to ex¬ 
pand the menu of model types available to 
a user in this hierarchical framework. To 
the extent that a set of different model 
types is included, we have a toolkit as 
opposed to a tool at our disposal. We leave 
the composition of the model types to the 

Model types can include common relia¬ 
bility/availability models such as fault 
trees, reliability block diagrams, and 
Markov and semi-Markov chains. Com¬ 
mon performance model types, such as 
product-form queueing networks 2 and se¬ 
ries parallel directed acyclic graphs, 11 are 
also included. Facilities for automatically 
generating a large Markov model from a 
more concise generalized stochastic Petri 
net 6 are available. To permit a combined 
evaluation of performance and reliability, 
we have included Markov and semi- 
Markov reward models. 

The main advantage of this approach is 
that the user can pick a natural model type 
for a given application, thereby avoiding 
any retrofitting. This clearly encourages 
the hybrid and hierarchical combination of 
submodels. 

Two types of quantities are computed as 
outputs of a SHARPE (symbolic hierarchi¬ 
cal automated reliability and performance 
evaluator) model: scalars and distribution 
functions. Scalars include probabilities 
and distribution moments. Closed-form 
distribution functions are also available as 
outputs of certain model types. For ex¬ 
ample, steady-state probabilities can be 
obtained for an irreducible Markov or 
semi-Markov chain. Throughput, average 
response time, and average queue length in 
the steady state are computed for a prod¬ 
uct-form queueing network. The mean, the 
variance, and the distribution of time to 
completion are computed for series-paral¬ 
lel directed acyclic graphs. The mean, the 
variance, and the distribution function of 
the time to reach an absorbing state are 
computed for a Markov or semi-Markov 
model. The mean, the variance, and the 
distribution of the time to failure are com¬ 
puted for a system modeled by a fault-tree 
or a reliability block diagram. 

Since input parameters to models can be 
symbolic expressions, we can use scalars 
computed from one submodel within an 
expression input to another submodel. 
Furthermore, many model types (fault tree, 
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* three level model 

* attach reward rates to states: 


reward 

* bottom level 

3 3.0 

markov bottom 

2 2.0 

A C delta 

1 1.0 

A S sigma 

cf 0.0 

end 

rf 0.0 

specify initial state probabilities: 

end 

A 1.0 

3 1.0 

end 

end 

* middle level 

* call for 8-digit output and assign values to symbolic parameters: 

semimark middle(k) 

format 8 

fault second exp(k*lam) 

bind 

fault resolve cdf(bottom) 

lam 0.0001 

end 

delta 1000.0 

fault 1.0 

sigma 1.0 

end 

end 

* define coverage, c(k); include second fault effect 

* ask for numerical evaluation of cdf of time to coverage failure: 

func c(k) 

eval(top,cf) 0.0 10.0 1.0 

prob(middle,resolve;k) *prob(bottom,C) 

* ask for expected reward rate at time t: 


eval(top) 0.0 10.0 1.0 exrt 

* top level 

end 

markov top 


3 2 3*lam*c(2) 


3 cf 3*lam*(l-c(2)) 


2 1 2*lam*c(l) 


2 cf 2'*lam*(l-c(l)) 


1 rf lam 



Figure 6. SHARPE input file. 


reliability block diagram, semi-Markov 
(reward) process, series parallel directed 
acyclic graph) allow distributions to be 
attached to various nodes. The node distri¬ 
bution is of the form 

( 22 ) 

where a and b. can be real or complex 
numbers and k. are nonnegative integers. 
The class of these functions is closed under 
convolution, mixing, order statistics, sum, 
product, integration, and differentiation. 
These operations are the ones used when 
computing the distributions for the chosen 
model types in SHARPE. Thus, the dis¬ 
tributional output of a submodel can be the 
input distribution to another model. 

As an example, we revisit the three- 
component model. First, we directly input 
the full Markov model of Figure 1 to 
SHARPE to obtain the closed-form answer 
for the distribution of time to coverage 
failure. In fact, the numbers in the Exact 
column of Table 1 were computed using 


SHARPE in this way. 

Next, we consider a three-level analysis 
of our system of Figure 1. The top-level 
model is the instantaneous coverage 
Markov model shown in Figure 3b. The 
bottom-level model is the single-fault 
coverage model of Figure 3a. The middle- 
level model is the semi-Markov model of 
Figure 5. First, the single-fault coverage 
model is exercised to compute the distribu¬ 
tion function of the time to each exit. This 
distribution function is passed to the 
middle-level model, where the events 
competing with the “resolve” exit event 
are the near-coincident failure events in 
the original full model. Rate modifier 
functions c(k) and s(k) are computed from 
this level and then passed to the top-level 
model. Figure 6 shows the SHARPE input 
file for the three-level hierarchical model. 
We can extend this top-level Markov 
model of the three-component system to 
include degradable levels of performance. 
Assume that the performance level (or 
reward rate) of state i is r. To compute r. 


we might use a product-form queueing 
network, a (semi-)Markov chain, or a 
generalized stochastic Petri net. 

In this example, if we simply let r. = i, 
the number of functional processors, we 
could then ask SHARPE to compute the 
expected number of functional processors 
at time I or the expected accumulated pro¬ 
cessor seconds in the interval (0,t) or the 
distribution of processor seconds until 
system failure. For instance, the expected 
number of functional processors at 10 
hours is 2.99699551. 

Save. The instantaneous availability, 
Alt), of a system is the probability that the 
system is functioning at time t. In the ab¬ 
sence of repairs from a system failure, state 
Alt) = R(t) system reliability. In the pres¬ 
ence of repairs we have the more general 
expression 

A{t) =Rlt)+] 0 Rlt-x) dMlx) 

where Mix), the renewal function, is the 
expected number of repairs in the interval 
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Table 2. Tool summary. 



HARP 

SURE 

Heiress 

SHARPE 

Save 

Model Basis: 

Fault Model 

Nonhomogeneous 

Markov 

Markov 

Nonhomogeneous 

Markov 

Coxian phase 

Markov 

Recovery Model 

ESPN; semi-Markov; 
empirical 

Semi-Markov 

Semi-Markov 

Coxian phase 

Markov 

Solution Technique: 
Fault Model 

Runge Kutta 

Failure path 
bounds 

Importance 

sampling 

Symbolic; 

randomization 

Randomization 

Recovery Model 

Simulation 

Failure path 
bounds 

Importance 

sampling 

Symbolic 

Randomization 

Output 

Reliability 

availability 

Reliability 

Reliability 

Performance 

reliability 

Availability 

Operating System 
Environment 

Unix; VMS; MS-DOS 

VMS 

Unix 

Unix; VMS 

VM; MVS 

Contact 

S. Bavuso, 

NASA LRC* 

R. Butler, 

NASA LRC* 

R. Geist, 

Clemson Univ. 

K. Trivedi, 

Duke Univ. 

A. Goyal, 

IBM TJW** 


NASA Langley Research Center 
* IBM T.J. Watson Research Center 


[0,x] (see Trivedi 2 ). 

The System Availability Estimator 
(Save) 12 was designed to evaluate both 
transient and steady-state availability 
(Um^AW) of systems modeled by con¬ 
tinuous-time Markov chains. Reliability 
evaluation is then a special case of its 
larger capability. 

One of the major contributions of the 
Save effort is its flexible and powerful 
input language. We have seen that model 
specification can become an arduous and 
error-prone activity when users must spec¬ 
ify the details of all state transitions in a 
huge (10 5 state) Markov process underly¬ 
ing a system reliability model. The Assist 
package for SURE and the fault-tree and 
Petri net input facilities for HARP address 
this problem. The main construct for the 
Save input language is the “Component” 
data structure, which defines components 
or groups of components with identical 
behavior. A component can be a field re¬ 
placeable unit, a whole system, a card, a 
chip, or even software in operation. The 
structure is 


COMPONENT: <comp-namexno .- 

of-comps> 

SPARES: <no.-of-spares> 

SPARES FAILURE RATE: <expres- 

OPERATION DEPENDS UPON: 
<comp-name>,... 

REPAIR DEPENDS UPON: <comp- 
name>,... 

DORMANT WHEN SYSTEM 
DOWN: <yes\no> 

DORMANT FAILURE RATE: <ex- 
pression> 

FAILURE RATE: <expression> 

FAILURE MODE PROBABILI¬ 
TIES: <prob-value>, ... 

REPAIR RATE: <expression> 

COMPONENTS AFFECTED: 
<comp-name>,... 

The structure provides for component 
interdependencies during both operation 
and repair, multiple failure modes, and 
mode-based failure rates, which facilitate 
representation of common designs such as 
hot and cold spares. 


From the Component descriptions, the 
Save system generates an underlying 
homogeneous Markov system whose tran¬ 
sition matrix. A, is stored using sparse 
storage techniques. The underlying system 
equation is then given in matrix form as the 
homogeneous linear system of Equation 8. 

Save employs randomization in solving 
Equation 8. Let B = / + A/m where m = 
maxiq.. I and I is the identity. The solution 
to Equation 8 is 

P{t)=P{0)e A ' (23) 

where e A ' = £„ A k t k /k\ and A k denotes /t-fold 
matrix multiplication. The expression 
Equation 23 is not computationally useful 
itself, but observe that by substitution we 
have 

P{t)=P( 0)<?< B -^ m ' (24) 


= P(0)'£ j (B k e~ m (mt) k /k\) (25) 

k=0 

All terms in B are nonnegative and less 
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than or equal to one, so that the error in 
using only the first K terms is bounded 
above by 



This bounded iteration combined with 
sparse storage is extremely effective, and 
state spaces of size 10 5 are well within the 
Save capability. 

The Save approach does not specifically 
address stiffness, that is, numerical diffi¬ 
culties due to huge transition rate dispari¬ 
ties, nor does it permit global time depen¬ 
dence or exact representation of nonexpo¬ 
nential holding time distributions (semi- 
Markov components). Still, it is an excep¬ 
tionally efficient and elegant approach to 
solving the large class of availability prob¬ 
lems that do fall within its domain. 

Recent extensions include automated 
truncation of the Markov chain state space 
and simulation of large Markov models 
using importance sampling. 


W e have presented a brief over¬ 
view of probabilistic reliability 
models used in the analysis of 
fault-tolerant multiple component sys¬ 
tems. Table 2 contains a summary. We 
have seen that each of these fourth-genera¬ 
tion modeling approaches has drawn upon 
earlier second- and third-generation ef¬ 
forts. In the same way, a next-generation 
effort will likely draw from each of these 
techniques in attacking the much larger 
problem domain that lies ahead. 

The additional requirements will cer¬ 
tainly include 

• integration of system design and 
evaluation; 

• integration of measurement data and 
modeling; 

• integrated evaluation of hardware and 
software; 

• explicit representation of both local 
and global time dependencies; 

• multilevel, hierarchical model decom¬ 
position; 

• error and sensitivity analysis; 

• an input language that facilitates speci¬ 
fication of intricate models with state 
spaces as large as 10 8 ; 

• representation of a broader notion of 
reliability: does the system deliver the 
performance for which it was de¬ 
signed; 

• component failure models based on 
the physics of semiconductor electron¬ 
ics that attempt to account for the ef¬ 


fects of temperature cycling, vibra¬ 
tion, and varying load; and 
hybrid solution techniques that target 
submodels with the appropriate sym¬ 
bolic, numerical, or simulative rou- 


Each of the techniques discussed has 
taken a large step in advancing the state of 
the art in reliability modeling. Many steps 
remain. ■ 
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Error-Control Coding 
in Computers 


Eiji Fujiwara, Tokyo Institute of Technology 
Dhiraj K. Pradhan, University of Massachusetts* 


S ophisticated error-correcting 
codes, now commonplace in both 
commercial and noncommercial 
computers, contribute substantially to 
achieving dependable, reliable systems. 
Prime examples can be found in a wide 
variety of applications, including codes for 
high-speed and mass memories and even 
for processors. 

Coding for computers is a distinct disci¬ 
pline that, unlike coding for communica¬ 
tions, must satisfy very restrictive speed, 
power, and area constraints. 1,2 For the 
codes to be useful, any increase in comput¬ 
ing speed demands a corresponding in¬ 
crease in the encoding and decoding speed. 
Therefore, high-speed VLSI implementa¬ 
tion of encoders and decoders is a major 
engineering concern. But speed is not the 
only concern; reliability is equally impor¬ 
tant. Since encoders and decoders use the 
same technology as the unit the code pro¬ 
tects, they cannot be assumed to be fault- 
free. Consequently, fault-tolerant implem¬ 
entation of encoders and decoders is also a 
topic of importance. 

In computer coding, we can sometimes 
rely on a priori information about error 
location. For example, single-parity codes 
are not useful in communications but can 
be quite useful in computers. In a RAM, 
fault location can be determined by run- 


Imaginative use of 
low-level error-control 
techniques can offset 
the need for massive 
high-level redundancy. 
This article covers the 
application of codes in 
actual systems. 


ning diagnostics. Once the location of the 
error is known, the bit can be treated as an 
erasure and corrected by a single-parity 
code. Thus, coding for computers does not 
always require new, sophisticated codes; a 
clever use of simple, existing codes may do 
the job as well. 

Another, newer application of error- 
correcting codes is yield improvement. 2,3 


♦This work was performed while Pradhan was visiting 
Japan under a Japan Society for Promotion of Science 
fellowship. 


As we approach RAMchipsofl6 megabits 
and beyond, random cell defects will be a 
major source of yield detraction. The con¬ 
ventional spare-row-and-column tech¬ 
niques will be of limited use. But on-chip 
coding could provide both benefits: pro¬ 
tection against cell defects and protection 
against soft errors. 

This article, intended for readers with 
basic knowledge in coding, surveys codes 
used in actual systems. 

Error control in high¬ 
speed memories 

Because high-speed caches and main 
memories are prone to soft errors, error- 
correcting codes are used in their design 
and, more recently, in the design of on- 
chip memories. For a code to be useful for 
high-speed memories, its structure must 
permit rapid, parallel encoding and decod¬ 
ing. The complexity of the parity check 
circuit used in the encoder and decoder can 
be a major factor in determining speed. By 
examining the structure of the parity check 
matrix, known as the H matrix, we can 
estimate the complexity of the parity check 
circuit. 

For example, consider a length-6 code, 
n = 6, with three information bits, k = 3, and 
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three check bits, r = 3. The two H matrices 
below provide the same error-correcting 
capability. Since all columns of the H 
matrix are distinct, the code can correct all 
single errors, but the parity check circuit 
for H, is less complex than that for H,. H, 
requires only three 2-input XOR gates to 
compute the parity checks, whereas H, 
needs two 2-input XORs to compute c 0 and 
c x and one 3-input XOR to compute c r 
Because of the 3-input XOR gate, the 
encoder and decoder using H 2 will be 
slower, as well as more complex. 

d 0 d x d 2 c 0 c x 

[ 10110 
0 110 1 

110 0 0 

c 0 = d 0 ®d 2 
c, = d x ®d 2 
c 2 = d 0 ®d l 

d 0 d x d 2 c 0 c x 

[ 10110 
110 0 1 

1110 0 

c 0 = d 0 ®d 2 
c x = d 0 @d x 
c 2 = d 0 ®d x ®d 2 

Basically, the number of l’s in the H 
matrix determines the overall complexity 
of the parity check circuit. The fewer the 
number of 1 ’s, the less complex the circuit. 
Also, the slowest parity check circuit cor¬ 
responds to the row with the maximum 
number of 1 ’s. Therefore, it is important to 
keep both the number of 1’ s overall and the 
number of 1 ’s in any row to a minimum. 

Bit error-correcting error-detecting 
codes. In high-speed memories, single-bit 
error-correcting and double-bit error-de¬ 
tecting codes (SEC-DED codes) are most 
commonly used. This is because many 
semiconductor RAM chips are organized 
for one bit of data output at a time; there¬ 
fore, any failure in one chip manifests 
itself as one bit in error. 

Hsiao codes. Let ’ s consider the two odd- 
weight r-tuples. Here, weight refers to the 
number of l’s. Note that the sum of two 
odd-weight r-tuples is an even-weight r- 
tuple. Because of this property, an SEC- 
DED code with r check bits can be con¬ 
structed. Here, the H matrix consists of 
odd-weight column vectors. Thus, the 
double-bit error syndrome is an even- 
weight r-tuple and is distinguished from 


the single-bit error syndrome, which is an 
odd-weight r-tuple. This code is different 
from the original Hamming SEC-DED 
code, whose H matrix has an all-1 row 
vector in addition to the SEC code H 
matrix. 

The Hsiao class of codes 4 is optimal 
because it has a minimum number of l’s in 
the H matrix, which makes the encoding/ 
decoding circuit simple. To obtain a high¬ 
speed encoding/decoding circuit, the 
number of l’s in each row is equal or as 
close as possible to the average number. 
The maximum code length is equal to the 
maximum number of r-tuples of odd 
weight and thus n = 2 r ~'. 

Figure 1 is a simple example of this class 
of code and its parallel decoding circuit. 
Figure 2 is an example of the H matrix for 
the (72, 64) SEC-DED code with code 
length n = 72, information bits k = 64, and 
check bits r = n- k = 8. 

For any SEC-DED code, the probability 
of miscorrection When triple or more errors 
occur must be minimal. A miscorrection 
here refers to an erroneous decoding that 
results in an erroneous corrected word. For 
example, for the code in Figure 1, a triple 
error in an all-1 code word 11111111 can 
produce 11111000. This will be miscor- 
rected as a single error, and the decoder 
will output the erroneous code word 
11101000. 

For the code in Figure 2, simulation 
gives a 43.72 percent probability of de¬ 
tected triple errors and a 99.19 percent 
probability of detected quadruple errors. 
On the other hand, a non-odd-weight- 
column code having the same code pa¬ 
rameters has a 24.0 - 43.5 percent prob¬ 
ability of detecting triple errors and a 98.90 
- 99.18 percent probability of detecting 
quadruple errors. 

These results show that odd-weight- 
column SEC-DED codes have two practi¬ 
cal advantages: encoder/decoder simplic¬ 
ity and lower probability of erroneous 
decoding. These codes are therefore 
widely used, for example, in IBM, Cray, 
and Tandem systems. Commercially avail¬ 
able parallel error-detection-and-correc- 
tion ICs are based on these codes. 

Error control for multiple-bit errors. 
High-density memory chips create new 
reliability problems. One example is the 
alpha particle problem in high-density 
semiconductor RAM chips. These soft 
errors may line up with existing hard er¬ 
rors, giving rise to multiple errors that are 
not correctable with SEC-DED codes. 

The direct method for correcting mul¬ 


•] 


tiple errors is to use multiple-error-cor¬ 
recting codes. Therefore, random double¬ 
bit error-correcting (DEC) codes are be¬ 
coming increasingly important. The well- 
known BCH code,' constructed using fi¬ 
nite-field theory, and the majority-logic 
decodable code are viable candidates for 
double error correction in memory. But 
these codes require twice as many check 
bits as the SEC-DED codes, and the decod¬ 
ing is ensuingly more complex. 

To solve the above problems, low-cost 
techniques using extensions of SEC-DED 
codes have been proposed. These tech¬ 
niques use erasure correction for errors 
whose location is already known a priori. 
This location information enables a dis¬ 
tance-4 code (SEC-DED code) to correct 
up to three erasures. In a different method 
based on address skewing,' multiple errors 
on the same address are dispersed as single 
errors in different addresses. These single 
errors can then be corrected. Another tech¬ 
nique for multiple error correction is the 
read-retry technique.' In this technique, 
repeated read cycles are used to eliminate 
the soft errors. Sparing replaces a defec¬ 
tive component with a spare one. This 
masking of hard faults requires some addi¬ 
tional memory read and write operations 
for detection and correction. 

Byte error-correcting/detecting codes. 

Certain high-density semiconductor mem¬ 
ory chips are organized b bits wide. If a 
failure occurs, the resulting word read-out 
is likely to have a 6-bit block (byte) in 
error. In this kind of application, it is desir¬ 
able to have an error-correcting code ca¬ 
pable of correcting/detecting byte errors as 
well as bit errors. 

Byte error-correcting codes. The H 
matrix for a single-byte error-correcting 
code is constructed as follows: Choose as 
columns of the H matrix all the nonzero r- 
tuples of elements from a finite field F, in 
particular, from Galois field GF(2 h ), such 
that no column of H is a multiple of another 
column. Thus, every pair of columns is 
linearly independent, and there is a mini¬ 
mum Hamming distance-3 for the code. 
The code, known as SbEC code,' is capable 
of correcting all single 6-bit byte errors. 

Implementing this type of code requires 
transforming the H matrix over GF(2 b ) to a 
binary form. By using a binary primitive 
polynomial, g(x) of degree 6, we can de¬ 
fine a nonsingular matrix T, expressed as a 
(6 x 6) binary matrix. 1 The set of these 
matrices is a field that is isomorphic to 
GF(2 b ). Therefore, the elements of GF(2 b ) 
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Information bits Check bits 
d 0 d, d 2 d 3 c 0 c 1 c 2 c 3 

'1110 1 0 0 o' 

110 1 0 10 0 

10 11 0 0 10 

0111 0001 . 


Syndrome: S = D • H r [s 0 , s v s 2 , s 3 ] 

s 0 = d 0 © d 1 © d 2 © Cg 

s 1 = d 0 © d 1 © d 3 © c 1 
s 2 = d 0 ©d 2 © d 3 ©c 2 
= d] ® d 2 ® d 3 ® c 3 
© : modulo 2 addition 


Memory readout word: D = [d 0 , d v d 2 , d 3 , c 0 , c v c 2 , c 3 ] 



Figure 1. A simple odd-weight-column code and its parallel decoding circuit. 
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Figure 2. An H matrix for the (72, 64) SEC-DED code. 
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Figure 3. SbEC-DbED extension of Reed-Solomon code. 


can be expressed as {0, T, T 2 , T 3 ,, T 2 ‘~ 2 , 
T 2 * -1 =1), where I is the (b x b) identity 
matrix and 0 is the ( b x b) zero matrix. 

The symbols n and k denote the code 
length and the information length, respec¬ 
tively, of this type of code over GF(2 b ). 
The derived SbEC code is an (N,K) code in 
binary form, where N = n ■ b and K = k ■ b. 
Similarly, the number of check bits is R = 
r ■ b = (n- k) ■ b. The maximum length (in 
bits) of this Hamming-type code is given 
by N h = b ■ n = b ■ (2 br - l)/(2 4 - 1). 

Fujiwara 5 has derived a new type of 
SbEC code that has an odd-weight column 
characteristic. This type of code over 
GF(2 b ), in general, satisfies the following 
condition for every distinct column vector 
having r elements in the H matrix: 

io hij=! ' 

for columns j = 0, 1, , n - 1 


where h. is the i'th element (e GF(2 b )) in 
the y'th column vector, / is an identity ele¬ 
ment in GF(2 b ), and Z means summation in 
GF(2 b ). It is easily proved that the corre¬ 
sponding binary converted form of the H 
matrix has an odd-weight column charac¬ 
teristic over the binary field. No two col¬ 
umns are identical, and no column is all-0 
or a multiple of another column. There¬ 
fore, two distinct columns are a linearly 
independent pair, and the code is at least 
distance-3. The maximum code bit length 
of the Fujiwara code is N f = b ■ 2' ,(r_1) . For 
b = 1, this is equivalent to the odd-weight- 
column SEC-DED code and therefore in¬ 
cludes it. Another important feature of this 
code is its error-detection capability for 
certain double-byte errors. That is, the 
code can always detect two byte errors, E. 
and E (t * j), provided their error patterns 
E. and E. are equal. 

Because SbEC codes do not guarantee 
detection of random double-bit, spanning 


H = 

I I I I I I I I 

j yl y2 -p3 j4 -p5 j-6 

I I I I I I I I 

j *|2 <p4 *y6 t 8 ylO ^12 ^14 

1 

I 

I 


j j2 j6 j8 j!0j 12 jl4 

j -pi y2 j3 y4 ^5 j6 ^7 

I 


Module 0 

Module 1 

Check part 


Figure 4. The (80, 64) S4EC-D4ED code used in the Fujitsu 380/382 system. 


over double-byte, errors, these codes are 
not used in computer systems. Instead, 
computers use single b-bit byte error-cor¬ 
recting and double b-bit byte error-detect¬ 
ing codes, called SbEC-DbED codes. 
Reed-Solomon codes are a general class of 
codes of any distance-d over GF(q), from 
which, as a special case, we can derive 
SbEC-DbED codes of distance-4 over 
GF(2 b ). The proposed extension appends 
three columns to the H matrix of the dis¬ 
tance-4 R-S codes. The H matrix in Figure 
3 shows this extended code, where {0, T, 
T 2 ,..., T 2 "- 2 , T 2 "- 1 = 1} s GF(2 b ). The bit 
length of the code is equal to b(2 b + 2). 
Therefore, such codes do not exist for in¬ 
formation lengths of k = 64 and 128 with 
byte lengths of b = 2, 3, and 4. 

Kaneda and Fujiwara* have proposed a 
class of SbEC-DbED codes having arbi¬ 
trary code and byte length. First, the H 
matrix shown in Figure 3 is converted to an 
H matrix whose first row is an all-I vector. 
We can write the converted matrix H, as 

H -I " 1 1 1 1 

H1 “ L h 0 h, h 2 . . h n] _, 

where h Q , h,, ... , h n are column vectors 
with two elements each, where n ] < 2 b + 2. 
Using the above defined matrix H, of an 
(N v N—R t ) SbEC-DbED code, where N t = 
n t b and R l = r, • b, the following H matrix 
is an SbEC-DbED code of length ( n x xn 2 ) 
bytes with (r, + r 2 - 1) check bytes: 

r h 2 ; h 2 i 

® - Lhoho hohoj h, h| hi j 

: h, i 

->,-i h. i ' h,.,J 


Fujiwara , 1 a recent and comprehensive secondary ref¬ 
erence. See the additional reading list at the end of this 
article for publication information regarding Kaneda 
and Fujiwara, as well as for several other original 


‘looo ooio iooo mi 0100 oooo 0001 0010 1000 ooio iooo mi nil 0001 0100 0001 1001 0100' 

ono oioo nil noo oioo ooio iooo 0001 noo iooo oon 0100 iooo ooio 1101 iooo ooio iooo 

iooi noi ioio oooi oooi oon noo 0001 0001 oooo iooo oon oioo noo iooo iooo 1101 ooio 

oooi iooo ooio iooo noo iooo 0101 ooio oioo nil ooio 0101 oooi ono ooio oioo ion oooi 

ooio iooo oioo iooo ooio oioo oooi oioi iooi oioo nn ooio ooio oon 1010 1001 oioo oon 

noo oioo oioo ooio ooio oooi oioo iooo ooio oon noo oioo iooo 1001 oooi noo 1111 0101 

oon oooi oooi ion 1010 ooio ooio oioo ono oooi ooio noo oon iooo 1111 oooo iooo ooio 

.oioo oooi oooi oioo iooo oooi iooo ono oon noo 1111 iooo oioo oioo iooo 1111 oooi oioo. 


Figure 5. Code used in the Fujitsu M-780 system. 
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Here, H 2 is a nonconverted H matrix of an 
(N v N 2 -R 2 ) SbEC-DbED code, where N 2 = 


Table 1. Check-bit lengths of codes for high-speed memories with information-bit 
lengths of k - 32, 64, and 128. 


n 2 ■ b and R 2 = r 2 • b. 

An interesting type of SbEC-DbED code 
is the modularized code. Figure 4 is an 

Code Class 

Check-bit length, where 
k = 32 k = 64 

* = 128 

modularized code used in the Fujitsu-380/ 

SEC 

6 

7 

8 

382 system. The matrix T is derived from 
the primitive polynomial g(x) = x* + x + 1. 

SEC-DED 

7 

8 

9 

The code structure gives a modularized 
organization to the encoding/decoding 
circuit such that the entire circuit can be 

DEC 

BCH-based 

12-16 

14-18 

16-23 

constructed from the same two subcircuits 

Majority-logic decodable 

16-24 

22-32 

31-48 

corresponding to module 0 and module 1 in 
the figure. 

SEC-DED-SbED 
b = A 

7 

8 

9 

Byte error-detecting SEC-DED 

b = 8 

10 

10 

11 

codes. A further extension is a class of 
codes to detect single-byte errors, as well 
as correct single-bit errors and detect 

SbEC 
b = 4 

8 

12 

12 

double-bit errors. SEC-DED-SbED codes 

b = 8 

16 

16 

16 

can be attractive, since they require only a 
small increase in redundancy. These codes 
have been studied extensively, 1 but the 

SbEC-DbED 
b = 4 

12 

16 

16 

“best” code to meet the upper bound on 

b = 8 

24 

24 

24 

code length for an arbitrary byte length b 






has yet to be found. However, we can 
realize an SEC-DED-S4ED code of r = 8 
check bits corresponding to the SEC-DED 
code (with K = 64 information bits). This 
type of code has already found application 
in the Fujitsu M-780 system, 6 which em¬ 
ploys 64K x 4-bit high-speed static RAM 
chips in the main memory units. Figure 5 
shows the adopted code, possessing the 
odd-weight column characteristic and 
eight bits of error-detection capability over 
any two bytes, with K = 64 and r = 8. 

Table 1 lists the codes used in the high¬ 
speed memories of some commercial sys¬ 
tems. Figure 6 shows the estimated gate 
count of the decoding circuit and the de¬ 
coding speed for those codes. 

Error control in mass 
memories 

Characteristic problems with magnetic 
tapes and disks and optical disks include 
burst errors, caused by defects and dust 
particles on the recording surfaces, and 
random errors, caused by noise in the read/ 
write heads. The above-described R-S byte 
error-correcting/detecting codes, inter¬ 
leaved with other codes and using erasure 
correction techniques, handle these prob¬ 
lems quite effectively. 

Tape memory codes. Magnetic tapes 
are widely used in computer systems. The 
half-inch, nine-track tape system is par- 



Figure 6. Decoder gate counts and speeds for the high-speed memory codes. 
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Figure 7. Data format encoded in AXP code for the 18-track high-density subsystem. 


ticularly prevalent and has evolved 
through extensive use. Errors in magnetic 
tape recording are primarily due to defects 
on the magnetic media or to variations in 
head-media separation in the presence of 
dust particles. Such errors often affect as 
many as 100 bits at a time, depending on 
recording density. Because of increased 
bit densities and tape speeds, tape systems 
require more sophisticated error-correc¬ 
tion codes. Newer coding schemes include 
the so-called optimal rectangular code 
(ORC) for 6,250-bits-per-inch nine-track 
tape units and the adaptive cross-parity 
(AXP) code for higher density, 18-track 

ORC is designed to correct any single- 
track or (given erasure pointers) any 
double-track error in the tape. ORC code 
words have a rectangular format, and 
check bits are located on two orthogonal 
sides of the rectangle. ORC is generated 
using S8EC R-S codes. 

The AXP code, 7 included in a class of 
convolutional codes, possesses a simple 
block structure compared to ORC, which 
is based on vertical and cross-parity 
checks. In this coding scheme, the 18 
tracks are divided into two sets, each con¬ 
sisting of seven data tracks and two check 


tracks. Figure 7 shows the data format 
grouped into two sets. Set A consists of 
nine parallel tracks, and set B consists of 
the remaining nine parallel tracks. In this 
figure, the two sets are shown side by side 
with a symmetrically ordered track ar¬ 
rangement. Each check bit in track 0 of set 
A and set B provides a cross-parity check 
along the diagonal, with positive slope and 
negative slope, respectively, involving bits 
from both sets. Each check bit in track 8 
of set A and set B is a vertical parity over 
the bits of the same position, m in sets A 
and B. The encoding and decoding equa¬ 
tions, that is, the four parity equations, are 
all simple. The erroneous track can be 
identified by the external pointers. Using 
these pointers, up to four erroneous tracks 
spanning over two sets, or up to three 
erroneous tracks in one set, can be cor¬ 
rected. 

Disk memory codes. Magnetic disk 
memory has played an important role in 
high-speed, large-capacity file memory for 
many years. As with magnetic tape sys¬ 
tems, burst errors predominate. Most er¬ 
rors are related to imperfections on disk 
surfaces or surface irregularities. The 
remaining errors are mostly due to heads. 


which are susceptible to random noise- 
induced errors. The disk has a higher rota¬ 
tion speed and, consequently, a higher data 
rate than tape. Hence, its sensing circuit 
design must allow for greater tolerance. In 
recent disk systems, the interleaved R-S 
code has replaced the Fire burst-error-cor- 
recting code. Apart from these, other im¬ 
portant recovery techniques such as defect 
skip, alternate data block, and reread are 
used to enhance reliability and data integ¬ 
rity. 

Digital optical disks are a relatively new 
technology for storing data. Each disk is 
coated with special materials. Reading and 
writing are performed using a laser. Al¬ 
most all errors are related to imperfections 
in the disk, and the remaining errors result 
from focusing shift in recording or random 
noise in reading. Therefore, this medium 
requires both burst-error-correcting and 
random-error-correcting facilities. 

Compact disc (CD) digital audio sys¬ 
tems 8 adopted a new code design 
technique, cross-interleaved R-S code. 
CIRC is a new class of doubly encoded 
codes in which the second R-S code en¬ 
codes the cross-interleaved outputs of the 
first encoded R-S code. 
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Figure 8. The (4, 2) computer. 


The digital data-storage system called 
compact disc ROM (CD-ROM) has almost 
540-megabyte capacity on one disk. CD- 
ROMs use doubly encoded R-S SbEC and 
cyclic redundancy check (CRC) codes in 
addition to CIRC. The two R-S codes used 
are (26,24) and (45,43) codes over GF(2 8 ). 
Thus, the data is effectively quadruply en¬ 
coded. If CRC is also included, then data is 
effectively quintuply encoded. 

Two types of optical systems, the write- 
once read-many optical disk (CD-WORM) 
and the writable or erasable optical disk, 
have been popular in computer mass 
memories. These memories use a large 
distance (distance-17) R-S code with 120- 
to 140-byte code length, interleaved to 
degree 4 to 10. 

Processor error 
control 

In 1972, Pradhan and Reddy 9 showed 
that parity code using redundancy of the 
order duplication could achieve effective 
error control. This theoretical prediction 
that only duplication can achieve error cor¬ 
rection in processors 9 has been realized in 
some sense in the (4,2) computer. 10 Figure 
8 illustrates this concept, which is already 
seeing use as a commercial product, the 
Philips S2500 switching system. 

The S2500 uses four processor-memory 
pairs. Each processor is 16 bits wide, but 
the memory is only 8 bits wide. Therefore, 
there is fourfold redundancy in the proces¬ 
sor but only twofold redundancy in the 
memory. The 16-bit output of each proces¬ 
sor is encoded into a (4,2) version of 
GF( 2 s ) R-S code, yielding a 32-bit code. 
This code can correct any single 8-bit byte. 
The output of each processor is encoded by 
separate encoders. Each encoder produces 
only 8 bits of the 32-bit code word. The 
memory associated with the ith processor 
stores the i'th byte; therefore, the encoder 
for the i'th processor produces the 8 bits 
corresponding to the i'th byte. In other 
words, the first processor’s memory stores 
the first byte, the second processor’s the 
second byte, and so on. 

On the read operation, all four bytes are 
fetched from the memory and decoded by 
the decoder, which can correct any single 
byte in error or any double-bit error in two 
different bytes. It may be noted that a 
single processor or memory failure affects 
only a single byte in the code word; thus, 
the system can tolerate failure of any single 
processor-memory pair. In addition, the 
code can correct single-byte-erasure and 


single-bit errors. Thus, one can remove a 
processor-memory pair for repair and 
continue to operate the system using error- 
erasure decoding. Any subsequent single¬ 
bit-error will get corrected. 

This use of coding in processor error 
control is quite attractive. The only real 
drawback is that, since the address lines to 
the memory are not encoded, no error cor¬ 
rection takes place on a write operation. In 
general, we believe that, by using nontra- 
ditional approaches such as the one de¬ 
scribed above, parity check codes can 
provide effective processor error control. 
Their use will eliminate the need for code 
conversion and its associated delay in 
memory-to-processor transfer, thus pro¬ 
viding uniform error control. 

Unidirectional error- 
control codes 

Unidirectional error codes have found 
recent applications in 4-megabit VLSI 
ROMs by NTT and large-area LCDs for 
defect tolerance 3 by NTT and have, there¬ 
fore, been the focus of recent research. 
Unidirectional errors 11 are defined as a 
class of errors where the mode of errors is 
presumed to be either 1 to 0 or 0 to 1 in a 
particular code word. However, no prior 
knowledge of which type of error may 
occur is assumed. Therefore, in a particu¬ 
lar transmission, the receiver may receive 
two successive words with two different 
types of errors, but any individual word 
may have only 1 to 0 or 0 to 1 errors. (This 
should not be confused with asymmetric 
errors where all code words are presumed 
to have the same type of errors and the type 
is known beforehand.) 


Fundamentals. Codes specifically de¬ 
signed for unidirectional errors are receiv¬ 
ing limited but growing attention. 1216 One 
of the assumptions in developing these 
codes is that errors induced by transient 
and intermittent faults are limited either to 
a small number of symmetric errors or to 
an unbounded number of unidirectional er¬ 
rors. Consequently, much of the research 
has been devoted to developing codes that 
can correct t symmetric errors and detect 
all unidirectional errors. The assumption 
here is that these faults cause a small num¬ 
ber of symmetric errors that need to be cor¬ 
rected, whereas faults that cause an un¬ 
bounded number of unidirectional errors 
need to be detected. 

The basic framework 11 for these codes 
shows that for a code to correct t symmet¬ 
ric errors and detect all unidirectional er¬ 
rors, the code must have a minimum asym¬ 
metric semidistance of (t + 1).* Asymmet¬ 
ric semidistance can be defined as follows: 
Given two vectors X and Y, let d w (X, Y) 
represent the number of positions in which 
X is 1 and Y is 0. Unlike Hamming 
distance, d ]0 (X,Y) need not be equal to 
d 01 (X,Y). For example, X = 0110 and Y = 
1000, rf 10 (X,Y) = 2 and d 10 (Y,X) = 1. The 
minimum asymmetric semidistance A is 
defined as 

A = min{d 10 (X,Y) IX.YsC) 

Consider the code described in Tables 2 
and 3. Using odd and even parity to achieve 
A = 2, the code can correct all single errors 
and detect all unidirectional errors. Al- 


*This result was derived independently by Pradhan," 
as well as by Bose and Rao (see additional reading list). 


July 1990 


69 









































Table 3. Syndrome bits and error positions. 


Table 2. A random error-correcting and unidirectional 
error-detecting code. 


Information Bits 
(Positions 1,2) 


Check Bits 
(Positions 3-6) 


d Q d { 

c 0 

c 

c 2 

C 3 

0 0 

1 

1 

0 

1 

0 1 

1 

0 

1 

0 

1 0 

0 

1 

1 

0 

1 1 

0 

0 

0 

1 



Table 4. Code length (re) of some single-error-correcting and 
all known unidirectional-error-detecting codes. 


Information 
Bit Length 
<*) 

Pradhan 11 

1980 

Bose and 
Pradhan 13 
1982 

Nikolos 
et al. 15 
1986 

Kundu and 
Reddy 16 
1990 

12 

25 

27 

25 

24 

15 

30 

30 

28 

27 

16 

35 

31 

29 

29 

30 

54 

48 

46 

45 

32 

60 

50 

48 

47 

62 

108 

83 

81 

79 

63 

100 

84 

82 

80 

64 

105 

85 

83 

81 

120 

171 

141 

139 

138 

126 

190 

149 

147 

145 

256 

342 

281 

280 

277 


s 0 

S, 

s 2 

s 3 

Bit in Error 

0 

0 

0 

0 

None 

1 

0 

1 

1 

1 

0 

1 

1 

1 

2 

1 

0 

0 

0 

3 

0 

1 

0 

0 

4 

0 

0 

1 

0 

5 

0 0 0 

All other combinations 

1 

6 

Multiple 
unidirectional 
error detection 


S 0 = c 0 ffirf„®l S 2 = c 2 © d 0 ffi d, 

S 0 = c, © d, © 1 S 2 = c, © d 0 © d t © 1 


Table 5. Best known /-error-correcting and all unidirectional- 
error-correcting codes by Bruck and Blaum. 14 



though this odd/even parity technique has 
not yet been generalized, two different 
techniques have been formulated to derive 
such codes. 

For nonsystematic codes, the basic tech¬ 
nique is to extract from a set of constant- 
weight code words a subset that has the 
desired A. For systematic codes, 11 the basic 
technique is to combine two different 
forms of redundancy, one for symmetric 
error correction and the other for unidirec¬ 
tional error detection. The original com¬ 
bining technique did not produce efficient 
codes, but it was modified and has become 
the basis of many ingenious techniques 
that produce a large class of very efficient 
codes. However, constructing codes that 
meet the bound derived by Bose and 
Pradhan 13 remains an elusive goal. The 
bound states that the number of check bits 
required to correct t errors and detect all 


unidirectional errors is bounded from be¬ 
low by 0((r+l)log/fc), but the codes discov¬ 
ered so far require about O((2r+l)log&) 
check bits. Table 4 compares certain 
single-error-correcting and all unidirec¬ 
tional-error-detecting codes; Table 5 pre¬ 
sents some of the best multiple-error- 
correcting and all the unidirectional-error¬ 
detecting codes known at the time of 
writing this article. 14 

Another area not yet fully explored is 
multivalued unidirectional error codes, 
where the code alphabets are over q-ary 
symbols. Here, the unidirectional error is 
defined as an error that either increases or 
decreases the component values. Bose and 
Pradhan 13 have presented an optimal class 
of unidirectional multivalued error-detect¬ 
ing codes, but other generalizations have 
not been explored. 

The following is an example of a real- 


world application of a unidirectional 
code. 3 

Application to masking asymmetric 
line faults. Unidirectional code has found 
a real-world application in VLSI micro¬ 
processors, where the bus line area in¬ 
creases as the processor word length in¬ 
creases. Since these lines connect circuit 
elements, line faults or defects seriously 
affect LSI chip yield and reliability. Unidi¬ 
rectional error codes have been applied in 
NTT 4-megabit VLSI ROMs 3 to mask 
these asymmetric line faults (defects in the 
address bus lines), without the need for ad¬ 
ditional circuits such as error decoders. 

Short-circuit and open-circuit defects in 
bus lines change the signal line to one of 
several levels — high, medium, or low. By 
controlling the bus driver and the bus ter¬ 
minal gate, however, the level of the faulty 
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Figure 9. Circuit for masking single, unidirectional “0” errors on bus lines. 


line can always be made either high or low. 
For open-circuit defects in the bus lines, 
adding a pull-up transistor at the tip of the 
bus line makes the isolated part of the line 
electrically high. For short-circuit defects, 
the bus drivers can be designed to maintain 
a high level on all bridged lines. The driver 
makes its output impedance high when the 
input signal level is low. Such a fail-safe 
design achieves asymmetry in errors. It 
makes the probability of “1” errors (1 
becoming 0) extremely small compared to 
“0” errors (0 becoming 1). Thus, we’ll 
concentrate on “0” errors. These asymmet¬ 
ric errors can be masked by new coding 
techniques. 

Figure 9 shows a bus line circuit that can 
mask single “0” errors. The decoder G 
consists of AND gates g 0 to g y correspond¬ 
ing to code words V 0 to V 3 in code C. 
Because each gate has transistors at the bus 
line where the element in the code word is 
“ 1,” the gate is only activated by receipt of 
the corresponding code word. 

The circuit in Figure 9 works correctly 
even if there is a single asymmetric “0” 
error in the bus lines. We assume that the 
input information is given and then en¬ 
coded into code word V 0 = (111000), and 
we also assume that one “0” error occurs in 
the fifth line, x 4 . The code word V Q is 
changed to V’ = (111010). However, AND 
gate g 0 , which would be activated only by 


V g if there were no fault, is activated 
because F 0 ' has l’s at the positions where 
V 0 has 1 ’s — that is, x 0 = x t = x 2 = 1. The 
other AND gates, g t to g 3 cannot be acti¬ 
vated for V 0 ' because V ( ' has at least one 0 
at a position where V, to V 3 have l’s — that 
is, x 3 for V t and V } andx 5 for V r Hence, one 
asymmetric line fault never causes faulty 
activation and can be masked. 

In general, when “0” errors change the 
code word Y to Y', any other code word, say 
X, in C has to satisfy the condition A Y' * 0. 
This causes the bus line circuit to work 
correctly. This condition shows that there 
are one or more cases where X has 1 at the 
position where Y’ has 0. From this, it can be 
easily proven that a code C with A = t + 1 
can mask t asymmetric errors. 


A s implementation costs for 
error-control coding continue 
to decrease, we expect more 
and more applications to be found. Indeed, 
the development of cost-effective low- 
level techniques may offset the need for 
massive high-level redundancy. 12 There¬ 
fore, the major challenge of the future is 
developing an integrated design frame¬ 
work where we can study the various trade¬ 
offs between low-level and high-level 
redundancy techniques. ■ 
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Fault Tolerance in 
VLSI Circuits 

Israel Koren and Adit D. Singh 
University of Massachusetts at Amherst 


T he primary motive for introducing 
fault tolerance in VLSI circuits is 
yield enhancement, increasing the 
percentage of fault-free chips obtained. The 
active area of monolithic VLSI chips has 
always been limited by random fabrication 
defects, which appear impossible to elimi¬ 
nate in even the best manufacturing proc¬ 
esses. The larger the circuit, the more likely 
it will contain such a defect and fail to oper¬ 
ate correctly. Thus, the defect density 
(number of defects per unit of chip area) in 
any fabrication line limits the size of the 
largest defect-free chip producible with 
commercially viable yields. Larger circuits 
demand a fault-tolerance capability to over¬ 
come fabrication defects while avoiding 
unreasonable costs. 

The first VLSI circuits, produced in the 
1970s, contained a small number (by today ’ s 
standards) of devices and consequently an 
acceptably low average number of defects 
per chip. Subsequent increases in circuit 
complexity resulted mainly from higher in¬ 
tegration densities produced by reducing the 
device feature size. Reduced feature sizes 
tend to make circuits more sensitive to very 
small defects. However, this was compen¬ 
sated for by reduced defect densities in the 
more advanced fabrication lines. 

As feature sizes enter the submicron level 
and increased circuit density from reduced 
feature sizes becomes more difficult to 
achieve, some designers have turned to 
larger area circuits and even full-wafer inte¬ 
gration. They hope to obtain the cost reduc¬ 
tions, size reductions, and performance 
improvements associated with higher levels 


Fault-tolerant designs 
of very large ICs 
primarily attempt to 
enhance yield. Such 
designs, first 
employed in memory 
chips, now encompass 
random logic VLSI 
and wafer-scale 
circuits. 


of integration. 

The prohibitively low defect-free yield of 
large circuits mandates on-chip fault toler¬ 
ance for yield enhancement. The incorpora¬ 
tion of fault tolerance is sometimes required 
not only to increase productivity but to en¬ 
sure feasibility. For example, wafer-scale 
circuits would have zero yield without fault 
tolerance. 

To see how on-chip redundancy can en¬ 
hance yield, consider Figure 1, which shows 
the schematic of a wafer on which individual 
circuit modules have been fabricated. These 
modules can be memory arrays, micropro¬ 


cessors, or any other functional units. In the 
discussion that follows, we assume that these 
modules are processors. Assume a total of 
240 processors fabricated on the wafer and a 
0.5 probability that any individual processor 
is defect-free (meaning the die yield is 50 
percent). We can expect an average of 120 
good circuits from the wafer. 

Suppose we want to implement a four- 
processor system on a single chip. Ignoring 
interconnections, each die contains four 
processors and is four times larger than be¬ 
fore. We would obtain a total of 60 multi¬ 
processor chips from the wafer. However, 
since all four processors must be defect-free 
for the die to be functional, the yield is (0.5) 4 
= 0.0625. Thus, we can only expect, on 
average, 60 ■ 0.0625, which equals 3.75 good 
multiprocessor chips per wafer. 

Now consider the possibility of introduc¬ 
ing redundancy to provide fault tolerance. 
Suppose we put five processors on each 
multiprocessor chip. We now have a func¬ 
tional chip if no more than one processor is 
faulty. With this single-fault-tolerance capa¬ 
bility, the yield of the multiprocessor chip 
becomes (0.5) 5 + 5(l-0.5)(0.5) 4 = 0.1875. 
Since the chips are bigger now, we will only 
obtain 240/5 = 48 chips from the wafer, and 
we can expect an average of 48 • 0.1875 = 9 
good multiprocessor chips per wafer. The 
redundancy more than doubled the yield in 
our example. 

So far we have assumed that introducing 
fault tolerance only required a spare proces¬ 
sor and no other overhead. In practice, addi¬ 
tional switch and interconnect structures are 
often required to enable the spare to take over 
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the functions of the failed module (proces¬ 
sor). In terms of increased circuitry, this 
overhead can be substantial, particularly 
with small modules. Furthermore, often the 
reconfiguration hardware is not fault toler¬ 
ant, so a defect in these circuits can be fatal. 

To study the effect on yield of the over¬ 
head caused by the reconfiguration mecha¬ 
nisms, let us arbitrarily assume that the extra 
reconfiguration hardware for our example 
system has an area and yield equal to that of 
a single processor. The fault-tolerant chip 
will then have an area equal to that of six 
individual processors. For the chip to be 
functional, the reconfiguration circuitry and 
four processors must be defect-free. This 
gives a chip yield of 0.5 • ((0.5) 5 + 5(0.5) 5 ) = 
0.09375. Since each wafer now has 240/6 = 
40 chips, we can expect, on average, 3.75 
good chips per wafer — no better than the 
nonredundant design. 

A smaller reconfiguration circuit, taking 
half the area of a processor, does enhance 
yield, with 6.136 good chips expected per 
wafer. However, if the reconfiguration over¬ 
head exceeds the area of a processor, intro¬ 
ducing fault tolerance into the multiproces¬ 
sor chip can become counterproductive. 

We therefore see that fault tolerance 


schemes for VLSI yield enhancement must 
be area efficient. Otherwise, the yield im¬ 
provement due to the fault-tolerance capa¬ 
bility might be negated by increased circuit 
area and the potential for additional faults in 
the reconfiguration circuitry. 

To more accurately analyze the expected 
yield in the above example and in other 
defect-tolerant designs, we need to under¬ 
stand the nature of manufacturing defects 
and the types of restructuring techniques 
available in practice. In this article we 
describe the defects that can occur when 
manufacturing VLSI ICs and the potential 
resulting faults, some commonly used re¬ 
structuring techniques for avoiding defec¬ 
tive components, and several defect-tolerant 
designs of memory ICs, logic ICs, and 
wafer-scale circuits. We will introduce yield 
models for chips with redundancy to predict 
the yield of such chips and determine the 
optimal amount of redundancy. 

Defects and faults 

We can classify manufacturing defects as 
gross area defects (or global defects) and 
spot defects. Global defects are relatively 


large-scale defects, such as scratches from 
wafer mishandling, large-area defects from 
mask misalignment, over and under etching, 
etc. Spot defects are random local defects 
from materials used in the process and envi¬ 
ronmental causes, mostly unwanted chemi¬ 
cal and airborne particles deposited during 
the various steps of the process. 

These two classes of defects contribute to 
yield losses. In mature, well-controlled fab¬ 
rication lines, manufacturers can minimize 
gross area defects. The yield loss due to 
random spot defects is typically much higher 
than the yield loss due to global defects. This 
proves especially true for large-area inte¬ 
grated circuits, since the frequency of global 
defects is almost independent of the die size. 
Consequently, spot defects concern us more. 

Some spot defects might cause missing 
patterns or open circuits, while others cause 
extra patterns or short circuits. We can fur¬ 
ther classify these defects into intralayer 
defects and interlayer defects. Intralayer 
defects occur as a result of particles depos¬ 
ited during the lithographic processes and 
are therefore also known as photolithogra¬ 
phic defects. Examples include missing 
metal (or diffusion or polysilicon) and extra 
metal (or diffusion or poly-Si). Interlayer 
defects include missing vias between two 
metal layers or between a metal layer and 
poly-Si, and shorts between the substrate and 
metal (or diffusion or poly-Si) or between 
two separate metal layers. These interlayer 
defects occur as a result of local contamina¬ 
tion, such as dust particles. 

Not all spot defects are structural defects 
resulting in discrete faults such as line breaks 
and short circuits. A defect causes a discrete 
fault only if it is large enough to connect two 
disjoint conductors or disconnect a continu¬ 
ous pattern. Consider, for example, the three 
circular open-circuit-type defects in the lay¬ 
out of metal conductors depicted in Figure 2. 
The two left-most defects will not discon¬ 
nect the corresponding conductors, but the 
third one will result in a discrete circuit fault. 

Some random defects that do not cause 
structural faults can result in parametric 
faults, where the electrical parameters of 
some devices lie outside the desired opera¬ 
tional window, affecting the performance of 
the circuit. For example, an open-circuit- 
type photolithographic defect, although too 
small to disconnect a transistor, might affect 
its performance. Parametric faults can also 
result from global defects that cause vari¬ 
ations in process parameters. In principle, 
faults (structural or parametric) result from 
the interaction of defects with the layout 
geometry. Thus, the probability that a defect 
will cause a fault might depend on the exact 
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geometrical position of the defect and on its 
size, as illustrated in Figure 2. 

Now we will describe the method used to 
determine the percentage of manufacturing 
defects that result in discrete faults. This type 
of calculation is necessary to determine the 
expected number of circuit faults, on the 
basis of which yield projection (discussed 
later) is carried out. 

The average number of manufacturing 
defects of type i (for example, photolithogra¬ 
phic defects of the open-circuit type) is usu¬ 
ally described by cM, where d. denotes the 
average number of defects (of type i) per unit 
of area and A is the chip area. To calculate the 
average number of circuit faults, denoted by 
X., we define a probability 0. that a defect of 
type i will result in a discrete circuit fault. 
Thus, X. = QdA . The product AO., also called 
the critical area for defects of type i, is 
denoted by A w . 

The probability that a defect will cause a 
circuit failure might remain constant for one 
type of defect or depend on the size of the 
defects (relative to the physical dimensions 
of VLSI patterns). For example, the size of 
an interlayer defect is in most cases rela¬ 
tively small, and the probability that it will 
cause a circuit failure equals the ratio be¬ 
tween the area of the overlapping region and 
the total area. In contrast, photolithographic 
(intralayer) defects, like those shown in 
Figure 2, have a randomly distributed size 
comparable to that of VLSI patterns. There¬ 
fore, the probability that such a defect will 
cause a failure depends on the pattern shape, 
its dimensions relative to the size of the 
defect, and its exact geometrical position. 

A commonly adopted assumption pre¬ 
sumes circle-shafted defects with diameter x, 
as shown in Figure 2. Experimental data on 
defects in many wafers lead to the conclu¬ 
sion that the diameter x of a defect has a 
density function fix) that increases as xf up to 
the mode x o of the distribution (that is, the 
value of x for which the density function is 
maximal) and then decreases as 1 /xF up to a 
maximum value of x M . The exact values of q 
and p are determined empirically. Typical 
values for these are q ~ 1 and p ~ 3, for which 

{ cx/x 2 if 0 < x <x o 
c$x 3 ifx o <x<x M 
0 if x >x M 

where c = 1/(1 - l/2(x Jx M ) 2 ). 

We define the critical area for defects of 
diameter x as the area in which the center of 
a defect (of diameter x) must fall to cause a 
circuit failure. We denote this critical area by 
A(x) and compute its expected value A using 



defects are shown. 


A c 


-j: 


A(x)f(x) dx 


We omitted the subscript i from AJ' 1 to sim¬ 
plify the expression. The ratio between A c 
and the total area A determines the percent¬ 
age of defects that cause circuit failures. Its 
calculation is thus necessary for yield pro¬ 
jection. In what follows, we illustrate how to 
calculate critical areas through the layout in 
Figure 2, which shows two common geomet¬ 
rical patterns in VLSI layouts: vertical con¬ 
ductors and an L-shaped conductor. 

The critical area A(x) for open-circuit 
defects in a conductor (dark blue) of length 
L and width w is shown in medium blue in 
Figure 2. Its size is given by 1 


A(x) = 


[0 if x<w _ 

[ (x-w) L + ± (x-w) \/x 2 —w 2 
if x>w 


The critical area is a quadratic function of the 
defect size, but forL»w, the quadratic term 
in A(x) becomes negligible. Thus, for long 
conductors we can use the linear term only. 
We can obtain an analogous expression for 
A(x) for short-circuit defects in a rectangular 
area of width 5 (between two adjacent con¬ 
ductors) by replacing w with s in the above 
equation. 

The L-shaped conductor (depicted in Fig¬ 
ure 2) is another common pattern in VLSI 
layouts. We can approximate its critical area 
for open-circuit defects with 1 


fO if x < w 

A(X) = {(.x-w) (L|+ + \ (x-w) 

+ ijtx 2 -(*-w) 2 if x>w 

The expression for the critical area in this 
case closely resembles the one for the verti¬ 
cal conductor. Again, if (L, + L 2 )»w , the 
linear term in A(x) is the dominant one. 

Common VLSI layouts consist of shapes 
similar to those shown in Figure 2 in differ¬ 
ent sizes and orientations. Consequently, the 
exact expression for the critical area of one 
layout will differ from that of another layout, 
making it difficult to calculate the critical 
area of all but very simple and regular lay¬ 
outs. Therefore, two other techniques have 
been proposed: Monte Carlo simulation and 
virtual artwork. 2 In the Monte Carlo ap¬ 
proach, circles representing defects are 
placed at random locations of the layout and 
the critical area is estimated. In the virtual 
artwork approach, an artificial layout is ex¬ 
tracted from the given layout to simplify the 
estimation of the critical area. 2 

Restructuring 

techniques 

When a VLSI circuit employs fault toler¬ 
ance for yield enhancement, we must test 
and reconfigure the components on chip to 
achieve fault-free operation. For most exist- 
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ing circuits, wafer-probe testing success¬ 
fully detects and locates faults. However, 
several researchers have proposed the inclu¬ 
sion of built-in test capabilities on chip. 

Once we have located the faulty compo¬ 
nents, we must restructure the circuit to 
avoid those components. Two approaches 
are employed. In the first, the discretionary 
wiring approach, the components or cells on 
the chip are not interconnected to begin with. 
Good components are connected by laying 
down additional customized interconnec¬ 
tion on the chip. In the second approach, the 
interconnection structure is fabricated along 
with the rest of the circuitry. Following test¬ 
ing, the signal-flow paths on the chip are 
reconfigured using “switches” associated 
with the interconnections to bypass faulty 
elements. 

The discretionary wiring approach was 
first used in experimental wafer-scale cir¬ 
cuits at Texas Instruments in the mid 1960s. 
Here, blocks of logic were first tested by 
wafer probes. A computer program then 
generated an interconnection pattern to con¬ 
nect good blocks to achieve the desired logic 
function. This pattern was fed to an electron- 
beam mask-making machine, which created 
appropriate interconnection masks. These 
were then used to lay down the metal inter¬ 
connection lines and obtain the completed 
circuit. 

These experiments successfully demon¬ 
strated the capabilities of the technology, but 
the cost of creating the custom masks was a 
major drawback. Also, the discretionary 
interconnect itself proved susceptible to 
defects, leading to significant yield losses. 
Consequently, this technique did not gain 
commercial acceptability. 

Today, discretionary wiring has become 
an attractive approach for on-chip fault toler¬ 
ance. New, direct-write, electron-beam li¬ 
thography machines can quickly create the 
desired custom metallization patterns di¬ 
rectly on the wafer, eliminating expensive 
intermediate optical masks. Such machines 
are now widely used to customize low-vol¬ 
ume gate arrays. Of course, the problem of 
additional defects in the customized wiring 
remains. However, we can minimize these 
defects by using conservative design rules. 
We can also set aside uncommitted channels 
for the discretionary interconnects so they do 
not run on the uneven surface over active 
circuitry, where they are much more prone to 
defects. 

Recent research efforts have also focused 
on other techniques for discretionary wiring. 
Lasers can etch patterns directly on the 
wafer. Deposition can then be carried out 
using the laser to create a chemical reaction 


in an appropriate gas at the wafer surface. 
Electron and ion beams have also served in 
this application. These newer discretionary 
wiring techniques also allow the repair of a 
completed VLSI circuit. 

The second restructuring approach has the 
interconnections already fabricated when 
the circuit is tested. This approach involves 
customizing the signal paths using 
“switches” to bypass faulty components. 
This eliminates the additional processing 
steps after testing needed by discretionary 
wiring. Here, the switches that connect and 
disconnect signal lines can take many forms. 
The simplest are conventional logic gates. 
However, these have the problem of volatil¬ 
ity. The chip would need reconfiguration at 
each power up. 

Researchers at Lincoln Laboratories have 
developed electron-beam programmable 
switches similar to those used in erasable 
programmable read-only memories. Here, 
using an electron beam machine we can set a 
“floating” polysilicon gate isolated in oxide 
to a high or low logic state as desired. Be¬ 
cause the gate is electrically isolated, it holds 
its charge, and the programming is nonvola¬ 
tile. This approach has an important advan¬ 
tage: we can program the switches first to aid 
testing. Once we have identified all the 
faulty components, they can be set for field 
operation. 

A number of other programmed switch 
technologies have also been developed. In 
defect-tolerant memory designs, the redun¬ 
dancy is usually programmed by blowing 
polysilicon fuses, either by a laser or by 
electrical overstressing with a high current. 

Lincoln Laboratories’ Restructurable 
VLSI technology 3 also provides an example 
of a switch-based restructuring approach. 
Here, logic modules are laid out beside 
uncommitted buses. After testing both the 
modules and the buses, we make the desired 
connections by linking (welding together) or 
cutting apart the interconnections using la¬ 
sers. The wafer is restructured in steps in 
such a way that we obtain a sequence of 
increasingly large subsystems, resulting fi¬ 
nally in the complete system. At each step, 
further tests check for any newly generated 
defects. To aid in this testing, temporary 
links connect the subsystem to package test 
points. At present, every link and'cut is tested 
using optical probing, although the yield for 
these operations is so high that such testing 
might be unnecessary in a high-volume 
environment. The implemention of several 
different wafer-scale circuits for signal 
processing applications has demonstrated 
the viability of Restructurable VLSI 
technology. 


Defect-tolerant designs 

Memory ICs were the first integrated cir¬ 
cuits to exploit fault-tolerant techniques. 
Memory chips are particularly dense and 
therefore extremely vulnerable to manufac¬ 
turing defects. Moreover, the demand for 
even higher densities continues to increase. 
Denser memory chips allow designers to use 
fewer chips in digital systems, reducing 
system integration costs, volume, and power 
dissipation. In addition, the high regularity 
of memory arrays greatly simplifies the task 
of incorporating defect-tolerance into their 
design. 

A variety of fault-tolerant techniques with 
a relatively small overhead have been pro¬ 
posed and successfully implemented in 
memory ICs. We will review those tech¬ 
niques and then present some recent propos¬ 
als for defect-tolerant designs of logic 
ICs and their extensions to wafer-scale 
integration. 

Most methods for incorporating fault-tol¬ 
erance (that is, redundancy) into VLSI ICs 
have the following objectives in addition to 
their main goal of yield enhancement: 

(1) No or very limited impact of the 
added redundancy on performance. 

(2) Equal or higher reliability. 

(3) Small additional area and power re¬ 
quirements. 

(4) Transparency to the user (after chip 
reconfiguration). 

(5) Fault-free ICs requiring no (or lim¬ 
ited) additional manufacturing steps. 

(6) Defective redundant elements re¬ 
placeable by other redundant ele¬ 
ments. 

Increasing the yield of ICs proves espe¬ 
cially important for new designs and manu¬ 
facturing processes, which have a high 
density of process-induced defects and con¬ 
sequently a low yield. Yield improvements 
of early prototypes of an IC can reduce the 
product’s introduction time and determine 
its commercial success. Defect tolerance has 
proved extremely successful in such cases, 
and spectacular 30-fold increases in yield 
have been reported. 4 Yield improvements 
due to defect tolerance tend to decrease as the 
manufacturing process matures. But even 
mature processes with lower defect densities 
have experienced 1.5-to-3-fold yield in¬ 
creases, proving the effectiveness of defect- 
tolerance techniques. 

Memory ICs. Defect-tolerant designs for 
yield enhancement started around 1979 with 
64-kilobit memories and continue today 
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Figure 3. Simplified schematic of standard and spare row decoders. 


with 4-megabit RAMs and beyond. The first 
approach used in memory ICs, and still the 
most common, adds spare rows and/or spare 
columns (also known as word lines and bit 
lines, respectively). The high regularity of 
memory arrays allows the use of a limited 
number of spare rows and columns (that is, a 
low redundancy overhead) for a large num¬ 
ber of repetitive circuits. A defective row or 
a row containing one or more defective 
memory cells can be disconnected and then 
replaced by a spare row. Each spare row has 
a dedicated programmable decoder, allow¬ 
ing it to replace any defective row. Similarly, 
spare columns can replace defective ones. 

The simplified schematic depicted in 
Figure 3 illustrates standard and spare row 
decoders. A fusible link connects a standard 
decoder to its associated row of memory 
cells. Similar fusible links are used at the 
inputs to the spare decoder. These fusible 
links can be blown individually using one of 
the techniques described earlier. A spare 
decoder, as shown in Figure 3, has double the 
number of inputs that a standard decoder has, 
for both true and complement inputs. By 
selectively blowing half of the fuses at its 
inputs, it can replace any standard decoder 
whose associated row is defective. The de¬ 
fective decoder will be disconnected by 
blowing the single fuse at its output. 

Note that if a spare decoder is not required 
because of the small number or absence of 
defects, it will be deselected and not need 
any fuse blowing. Also, if, after program¬ 
ming the spare decoder, the associated row 
of cells is found defective, it can be discon¬ 
nected by blowing the fuse at the output of 
the decoder. Another spare can be used. 

The number of spare rows and/or columns 
is determined to optimize the yield, after 
taking into account the additional area re¬ 
quired for the redundant circuitry and the 
probability of defects occurring in these cir¬ 
cuits. We will describe the method used to 
determine the optimal amount of redun¬ 
dancy later. 

After manufacturing, testing the chips 
determines the location of defects. Then the 
chips are reconfigured by disabling the de¬ 
fective rows and/or columns and program¬ 
ming the decoders of spare rows and col¬ 
umns to replace the defective ones. Thus, the 
repair of defective memory ICs consists of 
three phases: a diagnosis phase to detect and 
locate all defective memory cells; a repair- 
analysis phase to allocate spare rows or col¬ 
umns to all faulty cells; and a repair phase to 
disconnect the defective cells and program 
the allocated spares. All three phases are 
performed in a fully automatic manner and 
require no manual intervention. 4 


Error-correcting codes. Manufacturers 
frequently use error-correcting codes 
(ECCs) in large memory systems to mask 
intermittent faults. Thus, using ECCs for 
yield enhancement can contribute to reliabil¬ 
ity improvement as well. However, the asso¬ 
ciated area overhead is much higher than 
with the simple spare row/column scheme. 

For an example of a memory IC employ¬ 
ing an ECC for yield enhancement, consider 
Mostek’s 1-megabit ROM, in which seven 
parity bits added to the 64 data bits increased 
area more than 11 percent. The 71 memory 
cells selected simultaneously are positioned 
within the array so that any two selected cells 
are separated by 15 unselected cells. This 
allows the chip to tolerate not only single¬ 
cell failures but also clusters of multiple cell 
failures. A maximum of 16 kilobit failures 
out of the 1 megabit can be corrected. The 
use of ECCs might slow the memory, since 
the error-detection circuitry lies in the criti¬ 
cal path. This circuit was therefore designed 
to minimize the increase in access time. 

Recently, IBM developed an experimen¬ 
tal 16-megabit dynamic RAM, adding nine 
check bits to every 128 data bits. This chip 
combines ECCs with more traditional bit 
and word-line redundancy and achieves 
higher yield enhancement. 

Associative approach. The spare row/ 
column approach applies only to the replace¬ 
ment of individual faulty rows or columns. If 
we need to replace larger blocks of cells, as 
might happen with clustered rather than 


uniformly distributed defects, an associative 
approach as developed by Haraszti at 
Hughes Aircraft 5 looks attractive. The ad¬ 
dress of the defective block is stored in an 
associative memory, and any incoming re¬ 
quest to an address within the defective 
block will be redirected to a spare block. The 
spare block has a smaller size compared to 
the main memory array and, consequently, 
its access time is substantially smaller. Thus, 
even with the additional time required to 
access the associative memory (accessed in 
parallel to the main memory) before the 
spare block can be accessed, the overall 
access time increase is less than 2 percent. 5 
The increase in power consumption is insig¬ 
nificant (less than 0.6 percent), but the area 
increase is substantially higher than for the 
spare row/column scheme, ranging from 10 
percent for 64 kilobits to 27 percent for 1 
megabit. 

The associative approach can be extended 
to a hierarchical replacement scheme, where 
a large spare block can itself be repaired by 
another, smaller, spare block. 

Partially good chips. A different ap¬ 
proach to yield enhancement suggests the 
use of partially good chips. If sections in a 1 - 
megabit memory are defective beyond re¬ 
pair, we can reconfigure the chip to a usable 
0.5-megabit chip or even a 0.25-megabit 
chip. To do this, we must partition the cir¬ 
cuitry of the chip in such a way that fault-free 
sections can function independently. Sev¬ 
eral manufacturers of memory ICs, like 
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Figure 4. Row and column exclusion scheme. (PE = processing element; CE = 
connecting element) 


Motorola, IBM, and Westinghouse, have 
used this technique successfully. 

Note that this technique is orthogonal to 
other defect-tolerance schemes. For ex¬ 
ample, the individual sections within the 
chip might have spare rows and columns. 
Only when the available redundancy within 
a section is insufficient to overcome all the 
defects present in this section will the section 
be declared unusable. 

IBM refined the idea of using partially 
good chips, further dividing the independent 
sections in the memory chip into smaller 
blocks. These can be used separately after 
proper alignment by steering the data bits to 
the right positions. 

Logic ICs. The development of efficient 
defect-tolerant designs for random logic ICs 
like microprocessors is considerably more 
complex than for memory ICs. However, if 
some regularity exists in the structure of a 
given logic circuit, it might be possible to 
incorporate redundancy. A natural target for 
defect-tolerant designs — programmable 
logic arrays (PLAs) — have a regular struc¬ 
ture and implement random logic circuits in 
VLSI chips. The control sections of many 
microprocessors use large PLAs. Some de¬ 
signs have employed PLAs with as many as 
50 inputs and almost 200 product terms. 


Since these PLAs require large silicon areas, 
the incorporation of redundancy in their 
design can considerably improve the overall 
yield. 

Researchers have investigated defect-tol¬ 
erant designs of PLAs 6 and proposed adding 
spare programmable product lines, input 
lines, and output lines to protect against all 
types of possible defects. This technique 
resembles the redundant row/column 
scheme for memory ICs. However, unlike 
memory ICs, where all defects can be iden¬ 
tified by applying test patterns externally, 
the identification of defects in a PLA re¬ 
quires some built-in testing aids, like adding 
inputs to the AND plane. 

Defect-tolerant PLAs with spare pro¬ 
grammable product lines and added inputs 
for defect identification have recently been 
implemented within a 16-bit microproces¬ 
sor. 7 This microprocessor also includes a 
defect-tolerant data path. A micropro¬ 
cessor’s data path includes arithmetic and 
logic units, registers, and buses and usually 
occupies a large percentage of the overall 
area. A bit-sliced data path, with the inclu¬ 
sion of one or more spare slices, can exploit 
the regularity in the circuit. However, not all 
parts of the data path are regular. For ex¬ 
ample, the logic circuits associated with the 
status bits are highly irregular. Since we 


cannot replace such parts with common 
spare circuits, we must exclude them from 
the bit-slice organization. 

In logic ICs with irregular structures, 
duplication or even triplication of certain 
circuits might prove beneficial. If we use 
duplication, we must employ fault identifi¬ 
cation and then restructuring after manufac¬ 
turing. In the case of triplication, we can 
avoid these additional steps by using a ma¬ 
jority voter at the output, if only one defec¬ 
tive circuit out of the identical ones is al¬ 
lowed to fail. 

In its attempt to build a mainframe based 
on wafer-scale emitter-coupled-logic tech¬ 
nology, Trilogy employed replication for 
defect tolerance in random logic. However, 
the extremely large overhead (2-fold and up) 
associated with these techniques has sub¬ 
stantially limited their use in general. 

Wafer-scale integration. As we have 
seen, some manufacturers have already suc¬ 
cessfully employed on-chip fault tolerance 
to enhance the yield of high-density semi¬ 
conductor memory chips. That approach 
works because, given the regular structure of 
memory cell arrays, a small amount of re¬ 
dundancy only slightly increases the circuit 
area while significantly increasing the num¬ 
ber of good dies obtained from a wafer. 

Since regular structures seem better suited 
for on-chip redundancy, processor arrays 
look like attractive candidates for full-wafer 
integration. Array architectures are being 
widely investigated to speed up the perform¬ 
ance of computer systems through parallel¬ 
ism. For well-matched problems, they prom¬ 
ise orders of magnitude improvement in 
performance over traditional sequential 
computers. If we could sufficiently reduce 
the cost and size of such specialized parallel 
processing arrays, they would be much more 
widely used. 

Realizing the desired cost and size reduc¬ 
tions for parallel processing arrays depends 
on VLSI and wafer-scale integration (WSI) 
technology. Implementing a processor array 
in a monolithic integrated package can also 
facilitate significantly greater operating 
speeds, because interprocessor signals will 
not have to be driven off chip. Off-chip 
signal propagation in metal-oxide-semicon¬ 
ductor VLSI systems is considerably slower 
than signal propagation within the chips. The 
same problem exists, to a lesser extent, with 
the new silicon-on-ceramic and silicon-on- 
silicon hybrid technologies competing with 
WSI for high-density packaging. 

As device dimensions shrink further into 
the submicron range, reducing channel tran¬ 
sit times to tens of picoseconds, this problem 
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will become more acute. Signal propagation 
delays can critically limit the operating 
speed of VLSI systems, particularly highly 
pipelined array architectures employing 
fine-grained pipeline stages and extremely 
high clock rates. For example, a 330-MHz 
pipelined array multiplier reported by 
Siemens in West Germany would likely not 
be viable in a multichip implementation. 
Since many proposed array architectures 
require considerably more silicon than a 
conventionally sized die, the designs will 
clearly require WSI technology to achieve 
their full potential. Such an implementation 
might also improve reliability by eliminat¬ 
ing mechanical and electrical failures often 
observed at the pins and interconnection of 
traditional board-level designs. 

Several defect-tolerance approaches have 
been proposed for processor arrays. In one of 
the earliest schemes, the row and column 
exclusion approach, 8 redundant rows and 
columns of processors are implemented in 
the array. Each processor converts into a 
connecting element (CE) if required. If so, it 
no longer performs computational opera¬ 
tions but merely passes signals from input to 
output. If a processor in the array fails, all 
other processors in the row and column 
containing the failed processor become CEs, 
as shown in Figure 4. A reconfigured fault- 
free array with one less row and column 
results. The failure of a horizontal (vertical) 
link between two adjacent processors re¬ 
quires turning only the processors along the 
corresponding row (column) into CEs. 

While attractive in its simplicity and low 
hardware overhead, the above scheme only 
proves effective when we expect few fail¬ 
ures. Since an entire row and column must 
usually be disabled for each fault, multiple 
faults quickly degrade the array. To get 
around this problem, several other restruc¬ 
turing schemes have been proposed. 

Figure 5 illustrates a simple scheme that 
adds spare columns to the array. The proces¬ 
sors are reindexed in their rows so as to skip 
over the faulty processors. Once the reindex¬ 
ing has completed, the appropriate vertical 
connection can be made. For s spare col¬ 
umns, this scheme can tolerate up to s faults 
in each row. However, it requires a complex 
switch and interconnection structure to sup¬ 
port this reconfiguration. Increased inter¬ 
connection complexity can reduce yield 
because of increased area and also because 
of the possibility of defects in the intercon¬ 
nection. 

A large number of other restructuring 
schemes for mesh arrays have been pro¬ 
posed. The objective generally is to reduce 
the probability that an available spare cannot 


replace a failed processor, while minimizing 
the restructuring overhead. Obviously, the 
more elaborate schemes only benefit arrays 
of relatively large processors, where the total 
silicon area consumed by the processors is 
large compared to the redundant intercon¬ 
nection. 

The reconfiguration approach employed 
in Figure 5 can lead to relatively long links. 
This can negate the performance benefits of 
WSI, particularly in synchronous designs, 
where we must slow the clock to accommo¬ 
date the longest delay. Furthermore, since 
we do not know a priori which interconnec¬ 
tions will need reconfiguration to bypass 
failed nodes, we must implement all inter¬ 
connections with powerful driver circuits 
capable of driving the worst-case restruc¬ 
tured interconnections with acceptable de¬ 
lay. This can impose very significant area, 
power, and delay penalties on the design. 

An alternative allows us to ensure at de¬ 
sign time that the restructured interconnects 
will be short and bounded in length. The 
interstitial redundancy scheme 9 illustrated in 
Figure 6 achieves this. This approach sys¬ 
tematically introduces spare processors 
throughout the array. Each spare can replace 
any neighboring primary processor. Since 
the reconfiguration is local, restructured 


interconnections stay short. Optimal assign¬ 
ment of spares to failed primary processors 
employs a bipartite graph that has failed 
primary processors as one set of vertices and 
operational spares as the other set. An edge 
connects a failed primary to a spare if the 
spare can replace it. Well-known matching 
algorithms can quickly find an assignment 
that covers all the failed primary processors, 
if such an assignment exists. 

Researchers using defect-tolerance strate¬ 
gies similar to those outlined above have 
implemented a number of experimental 
wafer-scale systems. Digital signal process¬ 
ing systems fabricated through Restruc- 
turable VLSI technology developed at Lin¬ 
coln Labs have demonstrated the practicality 
of wafers with a few heterogeneous cell 
types. The ESPRIT (European Strategic 
Programme for Research in Information 
Technology) project also aggressively pur¬ 
sues WSI, including memory, microproces¬ 
sor, and array processor designs. 

The ELSA (European Large SIMD Ar¬ 
ray) two-dimensional array processor 10 
employs a two-level hierarchical defect-tol¬ 
erance approach. Since the processors oper¬ 
ate on a single bit at a time, they are quite 
small, and several (12x7) can fit on a con¬ 
ventionally sized chip measuring 6x6 square 
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millimeters. One column within this chip is 
redundant, so it can tolerate a single fault and 
still form a 12x6 array. At the wafer level, the 
chips are connected by a two-track intercon¬ 
nection network which can bypass faulty 
chips that contain more than one faulty proc¬ 
essor or fail for other reasons. 

The 3D Computer being developed by 
Hughes Research Laboratory from wafer- 
scale circuits employs the interstitial redun¬ 
dancy scheme to ensure short restructured 
interconnections. The physical location of a 
cell is close enough to its logical location in 
the array to permit making the interwafer 
connections between the cells. A 32x32 
processor prototype rated at 600 million 
operations per second has already been 
demonstrated, and a 128x128 design is 
under development. 

Yield estimations 

Designers considering a defect-tolerance 
technique for a VLSI circuit must estimate 
the projected yield. This allows them to 
determine the optimal amount of redun¬ 
dancy and the suitability of a proposed re¬ 
configuration scheme. 

The difficulty in modeling the yield of 


fault-tolerant IC chips arises mainly from the 
clustering of manufacturing defects during 
chip fabrication. Yield modeling proves 
relatively simple when we use Poisson sta¬ 
tistics to describe the distribution of faults 
per chip. According to this distribution, the 
probability of having exactly x faults in a 
chip is given by 

Prob{X=x)=^^ ( 1 ) 

where X is a random variable denoting the 
number of faults and X denotes the average 
number of faults expected per chip. For chips 
with no redundancy the yield is 

Y = Prob{X = 0) ~e~\ (2) 

As discussed earlier, the average number of 
faults per chip is given by 

^ = Z 4A (i) (3) 

where d. represents the density of type i 
defects and AJ" is the critical chip area for 
type i defects. 

Practitioners have known since the early 
days of IC manufacturing that the above 
yield formula is too pessimistic and leads to 
predicted chip yields lower than actual 


yields. It later became clear that the low 
predicted yield resulted from ignoring the 
clustering of faults, a phenomenon observed 
in practice. 

Proposed modifications to the above yield 
formula attempt to account for fault cluster¬ 
ing. The most commonly used modification 
assumes the number of faults to be Poisson 
distributed, but considers the parameter X to 
be a random variable rather than constant. 
Making X a random variable results in clus¬ 
tering of faults, no matter what the type of 
distribution assumed for X. 

We obtain the modified yield formula by 
averaging yield formula (2) with respect to a 
probability density function of X, denoted by 

m- 

Y = l~e~ x f(k)dX (4) 

The function J{X) is known as a compound¬ 
ing function. Several compounding func¬ 
tions proposed in the past lead to different 
yield formulas. A common one, the Gamma 
distribution," results in the well-known 
yield formula 

y=(l +tya)-“ (5) 


where a is called the clustering parameter 
and X is the average number of faults per 
chip. We can show that X is, in effect, the 
expected value of 1. When the clustering 
parameter a is large, that is, when a—the 
yield in expression (5) becomes equal to 
yield formula (2). This represents the case of 
random faults and no clustering. Smaller val¬ 
ues of a indicate increased clustering. Ex¬ 
perimentally derived values for a typically 
range between 0.3 and 5. 

Applying the same compounding proce¬ 
dure to the Poisson probability function for 
the number of faults in expression (1) results 
in the negative binomial distribution: 


Prob (X=x} = 


Rq+x) (Xlo-Y 

x\ Ra) (l +X/q)“ +1 


( 6 ) 


Yield formula (5) accounts only for faults 
resulting from spot defects. To account for 
gross area defects affecting large wafer ar¬ 
eas, we must include a gross yield factor Y 0 
in the yield model: 

y=y 0 (i+X/q)-“ (7) 

Yield models for chips with redundancy. IC 
chips frequently include several replicated 
circuit modules. We can often use chips 
containing a number of identical modules (of 
one type or more) even if some of the mod- 
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ules do not function correctly, obtaining in 
this way partially good chips. Alternatively, 
we can add a few redundant modules to the 
design and accept only those chips with the 
necessary number of fault-free modules. 

Consider chips with a single type of iden¬ 
tical modules. Let N denote the number of 
these modules. Define the following proba¬ 
bility: 

a M N = Prob {Exactly M out of the N 

modules are fault-free) (8) 

We can use this probability to calculate the 
yield of chips with redundancy and the yield 
of partially good chips. For example, if R out 
of IV modules are spares, meaning that a chip 
with at least (N-R) fault-free modules is 
acceptable, then the yield of the chip is given 
by 


Y=Y 0 I 

M=N-R 


°M.N 


(9) 


To derive an expression for a MJV , we need 
to know how to calculate the yield of a subset 
of M modules. To this end we have to make 
some assumptions regarding the change in 
the parameters X and a of the yield formula 
• when considering partial areas. The average 
number of faults depends linearly on the 
number of modules, M. However, the de¬ 
pendence of the clustering parameter, a, on 
M is less straightforward. 

Most papers on IC yield that took fault 
clustering into account assumed the 
parameter a is the same when considering 
the whole chip or only part of the chip. This 
assumption was based on “large-area clus¬ 
tering,” meaning the clusters of defects ex¬ 
ceed the chip size. This assumption often 
proved reasonable, since most clustering is 
caused by wafer-to-wafer variations of fault 
densities, especially for small-area chips. 

If we assume large-area clustering, we 
can calculate V by first computing the 
probability that a given number of faults 
occurs in the complete chip, then distribut¬ 
ing these faults uniformly among the N 
modules. Thus, the probability that exactly 
(W-M) modules will contain faults is 

V* = £ CjU) Prob[X N =x) 

x=N-M (10) 

where Prob{X N = x] is the probability that 
the chip has x faults and Q xJ (N> is the proba¬ 
bility that, given x faults, the faults are dis¬ 
tributed among exactly j out of N modules. 
Assuming that faults are distinguishable, the 
latter equals 



Figure 7. The effective yield versus amount of redundancy (T 0 = 0.9). 


for x>j and 0 <j<N (11) 


If we assume the negative binomial distribu¬ 
tion from expression (6), then the above 
equation yields 


V = X (-1 H N - k M )0 

k= 0 

+ 


We obtain the negative binomial distribu¬ 
tion from the Poisson distribution by averag¬ 
ing over all values of X, using the Gamma 
distribution function. This compounding 
procedure can be applied to any statistical 
measure. We can derive an expression for 
the desired measure by assuming the con¬ 
venient Poisson distribution (whose most 
useful property is the statistical independ¬ 
ence between faults in different modules). 
We can then apply the compounding proce¬ 
dure to obtain the required expression for the 
negative binomial model. This powerful 
compounding procedure has been employed 


to derive yield expressions for interconnec¬ 
tion buses in VLSI chips 12 and for partially 
good memory chips." 

The simple architecture analyzed in the 
preceding section is an idealization; actual 
chips rarely consist entirely of identical cir¬ 
cuit modules. All chips include support cir¬ 
cuits (like power supply lines, clocks, input 
and output buffers, etc.) shared by the repli¬ 
cated modules. The chips become unusable 
when support circuits are damaged. Since 
the clustering of the support circuit faults is 
not independent of the clustering of the 
module faults, we need to include in expres¬ 
sion (12) the average number of faults that 
cause defects in these support circuits. This 


a MSI 


= ± (-1 H N -x M )( N m) 

k=0 _ _ 

, k rK + (M+k)X y a 


(13) 


where X CK is the average number of fatal 
faults — or chip-kill faults — in the support 
circuits. 

We restricted the discussion above to the 
case where redundancy is provided to toler- 
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ate faults in a single type of circuit modules. 
However, the results have been extended to 
fault-tolerant chips with multiple types of 
modules. 13 

The yield expression in (9) can help us 
find the optimal amount of redundancy for a 
given fault-tolerant scheme. The optimal 
redundancy maximizes the number of ac¬ 
ceptable chips per wafer. When the redun¬ 
dancy increases, the yield of the individual 
chip tends to increase, but the number of 
chips per wafer tends to decrease. We there¬ 
fore need to maximize the effective yield, 
which is the chip yield multiplied by the ratio 
between the number of chips with and with¬ 
out redundancy on the same size wafer. 

Figure 7 depicts the effective yield of the 
four-processor chip described at the begin¬ 
ning, with the yield of a single processor 
being 0.5. Unlike the simplified analysis 
given earlier, here we take into account the 
clustering phenomenon and the gross yield 
factor Y 0 , for which we assume the value 0.9. 
Also, assume defects in the reconfiguration 
overhead area (for switches and interconnec¬ 
tions) are chip-kill faults. 

The four curves in Figure 7 show that the 
effective yield increases when we incorpo¬ 
rate redundancy into the design until we 
reach an optimal value of redundancy. Be¬ 
yond this value, the effective yield declines; 
additional redundancy contributes only to 
the area, not to the number of acceptable 
chips per wafer. As shown in Figure 7, the 
optimal redundancy for a=5 (only limited 
clustering) is R= 4. This is independent of the 
reconfiguration overhead, which changes 
from 0 to 1 times the area of one processor. 
The resulting number of acceptable chips 
from the wafer differs, though. We can ob¬ 
tain this number by multiplying the effective 
yield by the number of chips when no redun¬ 
dancy is introduced (that is, 60). Thus, an 
average of 10.11 and 18.88 acceptable chips 
per wafer are predicted for reconfiguration 
overheads of 1 and 0, respectively. 

Clustering affects the projected yield and 
the resulting optimal amount of redundancy. 
For example, for a=l the optimal redun¬ 
dancy is 2, with a projected number of ac¬ 
ceptable chips of 11.64 and 17.84 for over¬ 
heads of 1 and 0, respectively. Ignoring the 
clustering phenomenon for the sake of sim¬ 
plifying the task of yield projection might 
lead to incorrect decisions regarding the 
amount of redundancy to incorporate. 

D efect-tolerant techniques for VLSI 
circuits have made remarkable 
progress in recent years. As we 
begin the 1990s, the theoretical approaches 
for introducing and optimizing redundancy, 


as well as restructuring technologies, appear 
to be in place for more widespread use of re¬ 
dundancy for yield enhancement. Mean¬ 
while, as VLSI feature sizes approach physi¬ 
cal limits, it is likely the need for larger area 
chips to meet the growing demand for more 
complex monolithic systems will become 
much more pressing. We can expect the use 
of defect-tolerance techniques to provide 
viable yields for these large chips to become 
routine in such an environment. 

The largest circuits achievable through 
defect-tolerance techniques are full-wafer 
designs. After the disappointments of the 
early 1980s, progress here appears on track. 
Tadashi Sasaki, in his invited talk at the 1989 
International Conference on Wafer-Scale 
Integration, pointed out that earlier trends 
(including those underpinning Japanese 
long-range planning) had predicted a WSI 
technology in place by the year 2000. How¬ 
ever, progress has outstripped this schedule, 
and we can expect to see WSI systems in the 
1990s. 

Just as the most widespread use of defect- 
tolerant design today occurs in memory cir¬ 
cuits, the first large-volume wafer-scale cir¬ 
cuits will also likely be memory systems. 
One such circuit, the 40-megabyte wafer- 
stack module recently developed by An- 
amartic, consists of two 6-inch-diameter 
wafers. This module relies on the already 
well-established theory of fault tolerance in 
VLSI circuits and the well-developed tech¬ 
nology of restructuring. Future circuits will 
as well. ■ 
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Design Techniques 
for Testable Embedded 
Error Checkers 


E.J. McCluskey 
Stanford University 


M any digital systems include 
error detectors, circuits that 
detect and signal the occur¬ 
rence of errors. An error is the presence of 
a signal whose value is different from its 
value in a correctly operating system. Two 
main reasons for including error detectors 
in a system are to prevent the error from 
reaching the system output and to locate 
the site of the error. One way of preventing 
error propagation is to check the received 
data after data transfer and, if an error is 
detected, either request retransmission or 
correct the received data using an error- 
correcting code (as in RAMs). Another 
way is to check the result of an arithmetic 
operation and repeat the operation if the 
result contains a detected error. Large 
systems typically include error detectors, 
along with circuitry to record the location 
and frequency of error events. This infor¬ 
mation is used to facilitate accurate and 
fast maintenance. (For more details on the 
application of error detectors, see Ch. 9 of 
Rao and Fujiwara. 1 Another good source is 
Kraft and Toy. 2 ) 

The most familiar error detectors are 
parity checkers. They detect any odd 
number of incorrect bits in an n-bit word. 
The operation of an error detector depends 
on the presence of coded information: The 
information in the system is encoded into 

84 


Completely self¬ 
testing designs are 
available for all the 
important checkers. 
The techniques 
outlined here 
guarantee single-stuck 
fault testability. 


code words. These code words contain 
more than the minimum number of bits 
necessary to represent the information, 
since only a subset of all possible word bit 
patterns, called code words, are present in 
an error-free system. When a detectable 
error occurs in a word, it changes the word 
from a code word into a noncode word. For 
example, in a system that encodes each 
byte into a parity code, 9 bits are used for 
each byte. If an odd parity code is used, an 
odd number of bits in each byte equals 1. 
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Any odd number of changed bits in a byte 
causes the byte to have even parity. Error 
detectors contain checkers, circuits that 
have coded information as inputs and de¬ 
termine whether code or noncode words 
are actually present. Thus, the checker for 
parity-encoded bytes is a nine-input circuit 
that calculates the parity of its inputs. 

Since the function of a checker is to 
provide an appropriate output signal for 
noncode word inputs, testing the checker 
requires supplying noncode words as in¬ 
puts. In a fault-free system, the checkers 
receive only code words as inputs. Thus, it 
is not possible to test a checker while it is 
embedded in a system unless this ability is 
specifically designed into the system. If 
the system includes a scan-path facility 3 as 
a design-for-testability feature, then any 
checker whose inputs come only from 
bistables (latches or flip-flops) can be 
completely tested by scanning an appropri¬ 
ate set of test patterns into the bistables. 
For checkers whose inputs do not all come 
from scan-path bistables, it is necessary to 
modify the checker design to permit com¬ 
plete single-stuck fault testing. (Sec. 4.1 of 
Wakerly 4 discusses the placement of 
checkers in a system.) The following sec¬ 
tions present design techniques to ensure 
the testability of embedded checkers that 
cannot be tested by scan-path bistables. 
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Types of error 
detectors 


The structure of an error detector is 
determined mainly by the error-detecting 
code used in the words to be checked. The 
most important characteristic of the code is 
whether it is separable or not. Each word of 
a separable code (also called systematic or 
separate code) can be divided into two 
parts: the data part that represents the in¬ 
formation content and the check part that is 
determined by the data part. A parity check 
code is a separable code with the single 
parity bit equal to the check part of the code 
word. Another example is the residue code 
in which the check part is equal to the 
residue modulo M of the data part. A very 
common separable code is the duplication 
code in which the check part is identical to 
the data part. In a nonseparable code, it is 
not possible to determine which informa¬ 
tion word is represented from only a subset 
of the bits in the code word. The constant- 
weight (M out of N ) code is a nonseparable 
code. 

Any separable code can be checked by a 
structure such as that shown in Figure 1. If, 
as is usually the case, all bit combinations 
can occur in the data part, then there is no 
problem in testing the check symbol gen¬ 
erator. For the equality checker, the situ¬ 
ation is not so straightforward, since input 
combinations corresponding to error situ¬ 
ations cannot occur when no faults are 
present. Test techniques for equality 
checkers are described in a later section. 

Parity checkers and 
self-testing circuits 

The parity code is a separable code with 
the very special feature that the check part 
is a single bit. Because of this, parity check¬ 
ers use a simpler structure than the general 
separable-code-checker structure of Fig¬ 
ure 1. In fact, what is required is a network 
to compute the sum modulo 2 or Exclusive 
OR of the code word bits. Figure 2a shows 
a tree of two-input XOR gates that checks 
the parity of a 9-bit parity encoded byte. If 
an odd parity code is used, the output z 
equals 1 for all valid code words. (Other 
parity checker implementations are de¬ 
scribed in Sec. 4.1 of Sellers, Hsiao, and 
Bearnson. 5 ) 

The Figure 2a network can be consid¬ 
ered as three subnetworks: subnetwork H, 
the inputs and gates connected to lead h; 
subnetwork J, the inputs and gates con¬ 


Data part Check part 
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Figure 1. General structure of checker for separable code 



Figure 2. Parity checkers for 9-bit odd-parity code: (a) XOR tree; (b) testable 
XOR tree. 


nected to lead j; and subnetwork Z consist¬ 
ing of the single gate connected to the 
output z. Any single-stuck fault (a fault 


that causes exactly one of the signal lines 
in the circuit to have its value fixed at 
logical 0 or 1 rather than having its value 
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STC = Self-testing checker 
SOC = Single-output checker 
CW = Code word 


Figure 3. Structure for converting a self-testing checker to a testable single¬ 
output checker. 


determined by a gate output or primary 
circuit input) in network H or network J 
can be tested, since these subnetworks 
receive all possible input combinations. In 
fact, these subnetworks have the self-test¬ 
ing property. A checker subnetwork is 
called self-testing if and only if any single- 
stuck fault in the subnetwork causes an 
error indication at the checker output for at 
least one valid code-word input .to the 
checker. (Sometimes other fault models 
are used instead of the single-stuck fault, 
although this is, by far, the most common 
fault model. Fault models are discussed in 
Sec. 6.11 of my 1986 book. 3 ) Checkers are 
often designed to have the self-testing 
property to guarantee an error indication if 
a fault occurs in the network during normal 
operation. 



Figure 4. Circuit to combine n single¬ 
output checker outputs into a single 
error signal. 


Self-testing checkers. Any completely 
self-testing checker, one for which the 
entire checker is self-testing, must have at 
least two outputs, since a stuck-at-no-error 
fault on a single output cannot be detected 
with a code word input. The usual practice 
is to design a self-testing checker with two 
outputs on which the signals 01 and 10 
indicate fault-free operation and the sig¬ 
nals 00 and 11 occur in response to a 
single-stuck fault in the checker or a non¬ 
code word input. These are the signals that 
occur on the h and j leads of Figure 2a, so 
this circuit is self-testing if it is acceptable 
to use two leads rather than one for an error 
indication. Any parity check circuit that 
has two outputs, each equal to the parity of 
one of two disjoint subsets of the inputs, is 
completely self-testing. 

The self-testing property provides a 
general technique for ensuring the testabil¬ 
ity of embedded checkers. There are com¬ 
pletely self-testing designs for all of the 
important checkers. 4 Most systems with 
embedded checkers require some indica¬ 
tion that an error has been detected by an 
error checker. (System operation can then 
be halted and the conditions of the individ¬ 
ual error checkers read out for diagnosis.) 
A facility must be provided to combine the 
outputs of the individual checkers into one 
indicator. Structures for combining the 
two-rail outputs of self-testing checkers 
are discussed in the next section. 

Single-output checkers. If a single-out- 
put parity checker is required, it is possible 
to make the Figure 2a circuit fully testable 


by adding the additional circuitry shown 
by heavier lines in Figure 2b. (This design 
is similar to that discussed in Sec. 7.7.1 of 
Kraft and Toy. 2 ) During normal operation 
the added input “Test” is held at 0, and the 
circuit operation is unaffected by the added 
gate. For testing the output gate, the test 
input is set to 1 to place the 11 and 00 
combinations on leads h and j. Note that 
the added XOR gate is not self-testing, 
since its test input is always 0 for normal 
operation, but can be completely tested by 
setting Test=l. 

The technique illustrated in Figure 2b is 
general in the sense that it can be used to 
convert any two-output self-testing 
checker into a testable single-output 
checker as shown in Figure 3. An XNOR 
(exclusive NOR or equivalence) gate, 
shown as the output gate, provides an ac¬ 
tive-high output error signal. 

Figure 4 shows a structure for combin¬ 
ing the outputs of n single-output checkers 
into one error signal. This circuit requires 
n test signals, one for each of the single¬ 
output checkers. The output OR gate is not 
self-testing, since it has all zeros at its 
inputs during normal operation. It is test¬ 
able, since the individual test signals can 
be used to apply all of the single-1 patterns 
required to completely test it for all single- 
stuck faults. 

Two-rail checkers 

A disadvantage of the structure in Fig¬ 
ure 4 is that it requires n extra test signals, 
one for each individual checker. Figure 5 
shows a structure for combining the out¬ 
puts of n self-testing checkers that uses 
only one test signal. A two-rail checker, a 
circuit that checks that each pair of inputs 
has complementary values, is used to con¬ 
vert the n pairs of signals into one pair of 
signals that are complements if and only if 
all of the n input pairs have complementary 
signals. The two ouputs are then converted 
into a single testable output using the cir¬ 
cuit of Figure 3. 

Figure 6 shows a two-rail checker de¬ 
sign that converts two pairs of input signals 
to a single pair of output signals. This 
circuit is self-testing if and only if all four 
valid code words appear at the inputs dur¬ 
ing normal operation. 4 - 6 A tree of circuits 
such as that in Figure 6 can be used to 
convert n pairs of inputs into a single out¬ 
put pair. Whether or not this tree is self¬ 
testing depends on whether the inputs it 
can receive in correct operation are suffi¬ 
cient to test it. (The design of two-rail 
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Figure 5. Circuit to combine n pairs of self-testing checker outputs into a single 
error signal. 


checkers is discussed in my earlier work 
with Hughes and Lu 6 and with Khakbaz. 7 ) 

Equality checkers 

A very important type of checker is the 
equality checker or matcher, a checker that 
compares two input words to determine 
whether corresponding bits from the two 
words have the same value. The equality 
checker is the key component in checkers 
for separable codes (Figure 1) and for 
matching the outputs of duplicate circuits. 
The most straightforward design of an 
equality checker is illustrated in Figure 7 
with a circuit for comparing two 4-bit 
words, X and Y. Each bit pair is connected 
to an XOR gate whose output should al¬ 
ways be 0 if the system is operating cor¬ 
rectly. The outputs of the individual XOR 
gates are all connected to an OR gate whose 
output will be 0 as long as all of the XOR 
outputs are 0. 

While this is a very simple circuit, it is 
not self-testing. In fact, it is very difficult 
to test at all. Any stuck-at-0 fault on a gate 
output cannot be detected by any of the 
normal inputs to the circuit. The circuit can 
be made testable by adding another XOR 
gate and test signal to each of its XOR 
gates. This makes it possible to selectively 
complement one of the inputs to the origi¬ 
nal XOR gates, as shown for the; input to 
the output XOR gate in Figure 2. This 
modification more than doubles the com¬ 
plexity of the circuit, and it is still not self¬ 
testing. 

A self-testing equality checker can be 
obtained by complementing all bits of one 
of the words to be compared to form a two- 
rail code. The two-rail checkers discussed 
in the previous section can then be used to 
realize a testable equality checker. 4 An¬ 
other possibility is to concatenate the 
complemented and uncomplemented 
words to form a code word that has exactly 
half of its bits equal to 1. A testable fc-out- 
oi-2k checker can then be used, as de¬ 
scribed by Kraft and Toy, 2 to form a test¬ 
able equality checker. It is not clear that 
this approach has an advantage over the 
two-rail checker schemes. 

M-out-of-iV checkers 

One class of embedded checker remains 
to be discussed: the M-out-of-A checkers. 
Though less important than the others, 
these checkers are in use. The 1-out-of-n 
checker is used to monitor the correct 


operation of complete decoding networks. 
A self-testing implementation of this 
checker is formed by connecting all de¬ 
coded outputs that correspond to even- 
parity inputs to one OR gate and all outputs 
that correspond to odd-parity inputs to 
another OR gate. The outputs of the two 
OR gates have either 10 or 01 signals when 
the decoder is operating correctly. This 
design is self-testing, since the OR gates 
receive all inputs (00, 10, 01) required to 
test them for single-stuck faults during 
normal operation. It can be converted into 
a testable single-output circuit with the 
structure shown in Figure 3. (An extensive 
discussion of other designs for such check¬ 
ers is given in Sec. 7.4.3 of Kraft and 
Toy. 2 ) 

Constant-weight or M-out-of-A codes 
are used when it is desired to detect not 
only all single-stuck-at faults, but also all 
unidirectional faults. A unidirectional 
fault is any fault that causes error signals 
that all have the same incorrect value. All 
errors are caused either by correct 0’s 
changing to incorrect l’s or by correct 1 ’s 
changing to incorrect 0’s; it is not possible 
to have an incorrect 0 and an incorrect 1 
both present in an erroneous data word. 
Single faults in inverter-free networks (for 
example, the internal part of a program¬ 
mable logic array) are accurately modeled 
as unidirectional faults. 

The 2-out-of-5 decimal code is the most 
common example of an M-out-of-A code. 
The £-out-of-2k or k-out-of-2k+1 codes are 
the usual implementations, since they 
contain the maximum number of code 
words for a given word length. 1 Self-test¬ 
ing designs for M-out-of-A code checkers 
have been studied extensively by fault- 
tolerant computing theoreticians. 2 ' 8 ’ 9 A 


system implementation using these codes 
is described by Cook. 10 Since these 
checker implementations are self-testing, 
they do not present a testing problem. The 
methods described previously can be ap¬ 
plied to obtain testable single-output de¬ 
signs or to combine the outputs from sev¬ 
eral checkers into a single error indicator. 



Figure 6. Two-rail checker for two in¬ 
put pairs of signals. 



Figure 7. An example of an equality 
checker structure that is not testable 
or self-testing. 
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B y using the appropriate design-for- 
testability technique, as described 
here, it is possible to guarantee 
single-stuck fault testability for all the 
important embedded checker designs. ■ 
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The history of Posix: A study in the standards process 


James Isaak, Digital Equipment Corporation 


Posix has become the acronym for the 
important standard formally known as 
Portable Operating System Interface for 
Computer Environments. An organiza¬ 
tion originally called /usr/group was the 
precursor to the Posix effort when 
founded as a commercial Unix users 
group in 1980 to focus on AT&T’s Unix 
operating system. 

The organization’s name is now Uni- 
Forum, but at the time /usr/group was 
being established, a handful of compa¬ 
nies had begun putting Unix-based and 
Unix-compatible systems on minicom¬ 
puters and microcomputers. The dispar¬ 
ity of versions posed a concern that /usr/ 
group sought to address by establishing a 
formal committee. 

To keep the process manageable, the 
initial group was limited to 40 persons. 
Heinz Lycklama of Interactive Systems 
served as chair. Members adopted a re¬ 
quirement for a two-thirds majority as the 
criterion for decision making. 

At the time, AT&T was introducing its 
Unix System 3, which followed Version 
7; the University of California at 
Berkeley was developing a derivative 
called BSD version 4; and a few commer¬ 
cial vendors — notably Interactive Sys¬ 
tems, Microsoft, Human Computer Re¬ 
sources (HCR), and Perkin-Elmer — 
were promoting versions for various 
minicomputer systems. 

The first microprocessors the commit¬ 
tee targeted were a Zilog 8000, with an 
implementation by Onyx, and various 
implementations on Digital PDP 11/23 
systems. By this time, at least three Unix- 
like products were also available on the 
market: Idris, from Whitesmith’s; Coher¬ 
ent, from Mark Williams Company; and 
Unos, from Charles River Data Systems. 
The divergent paths of AT&T-licensed 
versions and the existence of indepen¬ 
dently developed implementations set 
the tone for the standards work. 

Key guidelines set up. The participa¬ 
tion of the three non-AT&T licensed sup¬ 
pliers led to establishment of two key 
guidelines in the group: 
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• First, the resulting document would 
need to stand on its own and not depend on 
a specific product as a reference point. As 
such, independent players should be able 
to develop systems to match the standard, 
not just buy systems from a single sup¬ 
plier. 

• Second, the focus would have to cen¬ 
ter on an interface description. 

This early targeting of an independent- 
vendor approach was a key to later accep¬ 
tance in the accredited standards process. 
The focus on applications portability was 
also important in establishing a scope for 
the work. The group focused on the pro¬ 
gram interfaces rather than on the admin¬ 
istrative, communications, or shell and 
utilities aspects of the system. 

Traditional Unix systems organized 
the documentation in a way that paral¬ 
leled the implementation. They sepa¬ 
rated items implemented as library func¬ 
tions from those implemented as intrin- 
sics. The /usr/group effort eliminated 
such distinctions. Instead, they used a 
strict alphabetic listing. There were simi¬ 
lar mixtures of services specific to the C 
language and to operating systems. 

By mid-1984, the document was com¬ 
plete and was distributed to the /usr/ 
group members for balloting. The results 
were published as the “1984 /usr/group 
Standard.” The group next looked into 
how the work could be placed before the 


American National Standards Institute 
(ANSI) and/or the International Organi¬ 
zation for Standardization (ISO) for ac¬ 
ceptance as an accredited standard. 

Independent of the /usr/group effort, 
the IEEE had authorized a project in 1983 
to work on a “standard for an operating 
system kernel based on the Unix operat¬ 
ing system.” Some of the /usr/group 
members joined the IEEE effort to try to 
promote alignment, only to learn that the 
IEEE effort was temporarily inactive. 
Nonetheless, the /usr/group transferred 
its work to the IEEE in January of 1985 
and the group’s active committee mem¬ 
bers aligned with the IEEE. The /usr/ 
group had found that the IEEE met the 
group’s key objective: it possessed status 
as an accredited ANSI standards devel¬ 
opment organization. But the deciding 
factor was that IEEE standards work de¬ 
pends on individual professional in¬ 
volvement. This differentiates IEEE 
standards work from groups in which in¬ 
stitutional representation is more com¬ 
mon. 

Achieving consensus. Those of us 
who were members of the IEEE start-up 
working group found we needed to study 
the required policies and procedures. 

But, one of the most useful tools we 
learned about is not a formal IEEE proce¬ 
dure at all but a methodology for identify¬ 
ing consensus that had evolved from soft¬ 
ware engineering groups. The methodol¬ 
ogy entails the following list of ques¬ 
tions: 

(1) Has your viewpoint been ex¬ 
pressed? 

(2) Do you feel you understand the 
other viewpoints expressed? 

(3) What seems to be the consensus 
position? 

(4) What objections to that position 
exist? 

(5) Is there a more solid common 
ground or a clear rationale for the selec¬ 
tion of the consensus position? 

This process can save substantial time 
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and spur the group towards much higher 
levels of overall consensus than voting 
procedures. Instead of using straw polls 
or other procedures that tend to segment 
the group, this method directs the group 
towards identifying the common ground. 

Basically, the first point conveys the 
idea that “we don’t need to hear the same 
thing five times.” This can save a substan¬ 
tial amount of time with those who feel 
obligated to state their agreement or re¬ 
state the arguments in favor of some 
view. The focus then goes on the points of 
difference, and how they can be elimi¬ 
nated — no matter how many people in 
the room have that view. 

This point in the IEEE standards pro¬ 
cess needs to be understood. Ultimately, 
the document must gain 75 percent accep¬ 
tance of a balloting group, which is typi¬ 
cally a larger and more diverse group than 
the working group. One person’s view 
can influence the whole group, and we 
have often found that a single individual 
can change or even reverse the corporate 
position of the group. 

Point 2 represents the other side of 
Point 1: Only if you feel your view has 
been heard and understood will you feel 
you do not need to state it again. 

Point 4 constitutes the heart of the pro¬ 
cess. The speed of decision-making can 
advance substantially when you replace 
the question “Does anyone else have 
something to say?” with “Does anyone 
have an objection to this course of ac¬ 
tion?” 

Rationale written. This doesn’t mean 
that all objections can be resolved. The 
Posix process required some time for us 
to realize that we should document these 
points as well as the rationale for our 
course of action. In fact, /usr/group con¬ 
tracted someone to write a rationale for 
the document in 1986. The rationale has 
provided a useful tool to brief new people 
and identify holes in previous lines of rea¬ 
soning. 

To help expedite the IEEE standard, 
both /usr/group and AT&T made their 
copywritten materials available as a ba¬ 
sis. As a result of the initial /usr/group 
work, three spin-off documents were ini¬ 
tiated: the Posix document, the X3J11 C 
language standard, and the AT&T Sys¬ 
tem V Interface Definition (SVID). 

While SVID is not a peer of the others as a 
consensus standard, it did provide a com¬ 
parison point to a specific implementa¬ 
tion. AT&T helped the group understand 
where Posix differed from System V, and 
this constituted another form of identify¬ 
ing objections and considering rationale. 
Other major implementation sources 
were UC Berkeley and Usenix. 

/usr/group remained active in this area 
as well. Lycklama took on the co-chair 


role for the IEEE group and also chaired a 
newly formed set of /usr/group technical 
committees. The /usr/group committee 
formed a basis for /usr/group participa¬ 
tion in the IEEE activity as an institu¬ 
tional representative (with organiza¬ 
tional as opposed to individual balloting 
recognition). The technical committees 
also provided a forum to discuss and de¬ 
velop concepts beyond the scope of the 
IEEE work. These included real time, se¬ 
curity, and internationalization. 

The concurrently formed C language 
group prompted one of the first territorial 
disputes the Posix group encountered. 
Since the 1984 /usr/group document did 
not draw a line between them, both 
groups were developing interfaces in 
some areas. The basis for resolution on 
this matter was a somewhat unilateral de¬ 
cision by the Posix group to give as much 
as possible to the X3J11 group. The Posix 
group determined that all Posix systems 
would have C but that not all C implemen¬ 
tations would have Posix. Therefore, it 
was decided the C environment should be 
as rich as possible. Still, the C group 
didn’t accept everything the Posix group 
was willing to hand off. Some of the items 
subsequently returned to the IEEE docu¬ 
ment. 

Participation expands. The transition 
under the formal standards umbrella at¬ 
tracted more people. The original group 
of 40 quickly grew to 70, and participa¬ 
tion from major computer vendors in¬ 
creased. The US National Bureau of Stan¬ 
dards, since renamed the National Insti¬ 
tute of Standards and Technology, was 
another actively involved major player. 

At first, the NBS was only linked to the 
evaluation process. As it became clear 
that government buyers were going to 
want some of the benefits promised by 
open systems, the NBS started taking a 
proactive role. Over the past six years, 
NBS/NIST has participated actively, de¬ 
veloped conformance test suites, and put 
forth Federal Information Processing 
Standards as guidelines for agency pro¬ 
curements. NIST employees chair three 
committees. The practical value of major 
user support cannot be ignored, particu¬ 
larly when combined with procurement 
tools such as the FIPS and the confor¬ 
mance tests. 

The convergence of NBS/NIST user 
interest and the impact of new technology 
systems (microprocessors, workstations 
and RISC) combined to focus significant 
resources on Posix, points which have 
differentiated the Posix work from many 
other standards. The other side of this 
story has been the pragmatic “engineer¬ 
ing” view of the Posix working groups to 
getting the document out the door in a 
manner responsive to industry needs, as 


opposed to a more formal approach. 

There is a price to pay for a lack of for¬ 
mality. We are currently in the mode of 
revising the document to a language-in¬ 
dependent format, since the initial work 
was all C-specific. We need to bring a 
more object/structure-oriented organi¬ 
zation to the material. We need to state the 
requirements in more concise language, 
perhaps in terms of testable assertions or 
more formal techniques. However, if 
these things had been attempted in 1985, 
the initial document would not be com¬ 
plete today and the window for broad ac¬ 
ceptance would have closed. The Posix 
process is not a model of the ideal way to 
produce a standard, but reflects one of the 
most efficient. 

Trial-use version. In 1986, the group 
successfully balloted a trial-use version 
of Posix. We revised the initial project 
title to remove the implementation con¬ 
cept of “kernel” and focused on “inter¬ 
faces.” Since our document was pro¬ 
duced on line (using Troff), we were able 
to provide the publisher with camera- 
ready copy. The document probably set 
some significant speed records, from the 
commitment of /usr/group to the IEEE in 
January 1985, to December 1985 when 
we started the balloting, to publication of 
the trial-use document in April of 1986. 

The trial-use standard did not take the 
world by storm, but did demonstrate the 
ability of the group to produce. In this 
case, trial use allowed the group to come 
out with a document that represented 90 
percent of the ultimate results and gain 
visibility of the issues. The process can 
cost six months of delay in getting the fi¬ 
nal document out two years later. A simi¬ 
lar approach in the more pressured era of 
1988 would not have been practical. 

In the last meeting before going to bal¬ 
lot in 1985, we went though a major 
change in our meeting style. Up to that 
point, all work was done in “the main 
group.” From 1986 on, we divided the full 
group into smaller units, each focusing 
on a topic area, and brought back issues 
and proposals to the larger group for ac¬ 
ceptance. This clearly made sense in a 
group with 70 people. Nonetheless, it ex¬ 
posed a concern that continues in various 
forms today: that of how an individual 
can cover all the groups critical to an or¬ 
ganization. 

In 1986, we started to expand the work, 
adding groups for shells and utilities 
(1003.2); test methods (1003.3); and real 
time (1003.4). These groups reflect the 
various ways in which the work has ex¬ 
panded over time. Hal Jespersen started 
the “dot-2” group, a natural extension to 
complete the initial objective focused on 
defining operating interfaces based on 
the Unix model. Roger Martin of NIST 
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started the 1003.3 group; this reflected 
his employer’s need for conformance 
tests and attracted a number of new 
people with a testing orientation. The 
third group was a rollover from the /usr/ 
group effort. Bill Corwin, who had 
chaired the /usr/group effort, continued 
as chair when the IEEE took over. 

The first meeting in 1986 was held in 
association with the Italian Unix Users 
Group in Florence, Italy. IEEE projects 
are open to international participation 
and, in areas where we seek international 
input and acceptance, meetings outside 
the US can be critical. Here, we saw two 
concerns. First, fewer of our regular at¬ 
tendees could attend. There was also a 
concern that this was a boondoggle, and 
terrorist threats in response to the attack 
on Libya in April of 1986 discouraged a 
certain amount of travel. Nearly 50 per¬ 
cent fewer attended. As a result, the is¬ 
sues addressed and progress made were 
slightly different from the previous US- 
based meetings. 

We used the meeting as a sounding 
board for our future priorities and to 
broaden European awareness of our 
work. We met with members of the X/ 
Open group and established communica¬ 
tions links we have been using ever since. 
It was easier to bring Posix to the ISO in 
1987 because we had already demon¬ 
strated our interest in the international 
aspects of the work. 

Tricks of the trade. The trial-use stan¬ 
dard did not provide a complete picture of 
the work that 1003.1, the IEEE operating 
system group, felt was required. The 
document posed five specific questions 
to the industry for feedback, and the final 
resolution of these exemplified some key 
“tricks of the trade.” Some of them reflect 
differences between the Berkeley and 
System V implementations, and two re¬ 
flect the group’s sense that the traditional 
implementations were deficient. 

One significant area of concern in the 
trial-use standard was with “signals,” a 
method Unix uses for flagging excep¬ 
tional conditions. Signals can be used to 
interrupt or to kill a process. In the Sys¬ 
tem V version, it was not possible to en¬ 
sure which of these would occur. The 
Berkeley version provided masking and 
queuing mechanisms so a process could 
manage the impact of signals. In the case 
of System V versus BSD functionality, 
the conclusion was to build on the 
Berkeley approach, with a limited com¬ 
patibility with the System V approach. 

A second difference between BSD and 
System V in the trial-use standard was the 
handling of filenames that were too long. 
System V behavior is truncation to the 
length limit (which is a Posix implemen¬ 
tation parameter). The BSD approach is 


to flag such filenames as an error (with 
the error name: “enametoolong”). While 
technical arguments could be applied, it 
became obvious that we would not gain 
the required consensus if we selected one 
of these approaches over the other. This is 
when we implemented the second 
“trick”: compromise. Posix has support 
of the “enametoolong” error as an op¬ 
tional characteristic, with the default be¬ 
havior being the System V truncation. 

A third point of difference in the trial- 
use standard occurred in the handling of 
I/O operations and whether one process 
can interrupt another. In System V, the 
flag 0_ndelay indicates that a process 
should return a zero status and not block 


The IEEE's public open 
process leaves the door 
open to new participants 
at any meeting. 


on an I/O operation. However, this same 
status occurs on end-of-file read opera¬ 
tions, making it difficult to determine the 
cause. BSD defined the same flag but 
with different semantics, returning the 
error “ewouldblock” if the I/O would 
block, and with a slightly different scope 
of effect for the flag. The solution in this 
case was to introduce a new flag, 
0_nonblock, and not reference 0_ndelay 
at all, with semantics that differed in key 
ways from both the System V and the BSD 
approaches. This allowed systems to con¬ 
tinue to offer compatibility with the his¬ 
torical behavior and at the same time sup¬ 
port the Posix requirement. 

Ioctl (or I/O control) is a function in 
Unix-derived systems used as a general 
catchall for controlling I/O devices. The 
working group felt that some device con¬ 
trol (at least terminal characteristics like 
baud rate) needed interfaces. During the 
trial-use standard period, however, we 
could not break a deadlock between tradi¬ 
tional Ioctl use, which permitted variable 
length structures to be used with inter¬ 
mixed and overloaded I/O fields, and a 
network “sanitized” version that in¬ 
cluded length-of-structure information 
and clarified the direction of dataflow for 
each component of the structure. 

The solution to this dilemma was to de¬ 
fine a set of higher level functions (called 


terminal I/O control or Termio functions) 
in which each function has a well-defined 
structure implied by the name of the call. 
These can be implemented on either the 
existing Ioctl versions or by using some 
other approach more consistent with dis¬ 
tributed processing. The trick of note 
here is to determine if a higher level ab¬ 
straction of the problem can span the 
viewpoints and result in consensus. 

File and record locking. The final is¬ 
sue raised in the trial-use standard was re¬ 
lated to file and record locking. The 
working group had developed one ap¬ 
proach and System V another. In addi¬ 
tion, there were questions about whether 
locking should be enforced on other pro¬ 
cesses using the same resources or be a 
condition that should be tested by other 
processes in some cooperative form. 

The mandatory locking issue raised 
questions of abuse and security concerns, 
since the model could be applied to many 
system resources. The decision favored a 
cooperative approach. 

While the System V approach was con¬ 
sidered, the working group felt the trial- 
use method was technically superior. 
About a year after this issue was resolved, 
X/Open started participating as an insti¬ 
tutional representative. X/Open delegate 
Mike Lambert immediately wanted to 
open the locking issue, raising concerns 
that another day or two of debate would 
be needed to fully orient X/Open. How¬ 
ever, using a simple chart and five min¬ 
utes of discussion, the working group 
overwhelmingly changed to support the 
System V approach. 

The IEEE’s public open process leaves 
the door open to new participants at any 
meeting. Often, such people need to be 
briefed to avoid forcing the group to go 
back to old issues. The rationale for deci¬ 
sions can be a key tool to help reduce this 
impact. However, as this example sug¬ 
gests, a new participant with a different 
viewpoint on the issue can have signifi¬ 
cant influence on the work, much to the 
benefit of the standard. 

Helen M. Wood, this year’s IEEE Com¬ 
puter Society president, was the society’s 
vice president for standards when we 
reached the point of taking Posix to ballot 
in 1987. Wood was employed at the NBS/ 
NIST at the time and was also vice chair of 
the US Technical Advisory Group for the 
ISO SC22 group. 

This blend of roles provided a boot¬ 
strap for Posix into the ISO arena and pro¬ 
vided the leverage of years of expertise. 
Wood was able to instigate action in the 
ISO long before the IEEE working group 
could have understood the paths and pro¬ 
cedures. Other ISO experts like Bob Fol- 
let, Gary Robinson, Sava Sherr, and Fran 
Schrotter provided key guidance and sup- 
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port in getting the right documents to the 
right committees. 

After holding some preliminary ISO 
meetings to get a better understanding of 
Posix and the System Software Interface 
(SSI) work proposed by Japan, the com¬ 
mittee recommended assigning the Posix 
work to SC22 in May 1987. 

The ISO working group was formed at 
the SC22 plenary meeting in Washing¬ 
ton, DC, and agreement was reached to 
register the document in the IEEE ballot¬ 
ing process as a draft proposed standard 
in the ISO. This allowed us to ballot in the 
ISO and the IEEE at the same time and co¬ 
ordinate our responses so that the docu¬ 
ment proposed for the next ISO step (a 
draft international standard) would be 
identical to the IEEE-approved standard. 
Since we could coordinate the close of 
balloting in both cases (holding the bal¬ 
loting open for the first one to end until 
the second was completed), we were able 
to accomplish our identical-document 
objective. We have been using a similar 
model for subsequent international stan¬ 
dards synchronization, with significant 
success. 

Additional activities launched. The 

standards work requests in the Computer 
Society’s Technical Committee on Oper¬ 
ating Systems (TCOS) have been ex¬ 
panding rapidly over the past 18 months. 
In addition to extensions and expansion 
on the Posix model, three other major ar¬ 
eas of activity have been launched. One is 
the profiling area of work (see Computer, 
February 1990, pp. 69-70), which relates 
suites of standards to specific application 
domains. Associated with this is the 
Guide to Posix Open System Environ¬ 
ment being developed by PI003.0. In ad¬ 
dition, TCOS has undertaken the formal 
standardization of the software Applica¬ 
tion Program Interface (API) elements of 
the X Window System, with the 1201.x 
series of projects. Finally, some of the ar¬ 
eas of networking APIs are Posix inde¬ 
pendent, so we have work such as APIs 
for the X.400 and the File Transfer Ac¬ 
cess Method (FTAM). 

Simultaneously with this expansion of 
committee work, Donn Terry agreed to 
take over as chair of the PI003.1 working 
group and also to chair the US TAG to 
establish a US position on the ISO’s Posix 
(JTC1/SC22/WG15) work. We formed a 
Sponsor Executive Committee to assist 
in coordination and recently have been 
adding a level of steering committees 
and vice chairs to the executive commit¬ 
tee to provide a framework for coordinat¬ 
ing the work. With 19 working groups, 

29 projects (the overlap reflects exten¬ 
sion work and related documents being 
done within existing groups), and more 
than 300 active participants, the work 


requires a high degree of coordination. 

We have instituted a fee structure to 
cover our mailing expenses, which run 
into six digits annually, and our meeting 
coordination costs. Collected fees have 
allowed us to expand and permit partici¬ 
pation and communications with any in¬ 
terested person without bankrupting the 
Computer Society or creating financial 
hardships for the volunteers. 

The European Commission hosted our 
meeting in Brussels in October 1988. Es¬ 
tablishing a solid connection to Europe, 
which is quickly growing to hold the plu¬ 
rality of information technology pur¬ 
chasing (it is larger than North America 
or Asia), is a key to future acceptance of 
the standards. The European and non-US 
influence on our work has been growing. 
The European Unix Users Group is regu¬ 
larly represented and is reporting prog¬ 
ress (as well as raising questions). We 
have regular participation from British 
Telecom, the UK government, Canada, 
Germany, and Japan, with active tracking 
though less regular attendance from Swe¬ 
den and the European Commission. Two 
of our working group vice chairs attend 
from Europe (Germany and Britain). 

Future challenges. The greatest chal¬ 
lenge we face over the next few years is to 
find channels to help advance work in ar¬ 
eas where the actual task of developing a 
formal standard is premature. Fortu¬ 
nately, groups like UniForum, X/Open, 
OSF, Unix International, and the X Con¬ 
sortium provide forums for this. 

With input from the industry, we can 
focus the TCOS efforts on the priority ef¬ 
forts where standards can be produced ef¬ 
fectively and leverage these other organi¬ 
zations in this undertaking. 

One example now coming to fruition is 
the X Lib concept of directly balloting a 
component of the X Window System. 

This work from the X Consortium, with 
support from X/Open, will be taken di¬ 
rectly into the IEEE balloting process to 
establish a consensus. This is not a rub¬ 
ber-stamp undertaking, since objections 
can still result in changes, all objections 
must be addressed, and the need remains 
to gain a 75-percent approval as with any 
other document. 

If we succeed in this venture, it can help 
provide the formal industry review and 
consensus process needed for adoption of 
standards, leveraging the work of these 
other organizations without duplicating 
effort. Fortunately, these groups reflect 
some degree of international participa¬ 
tion and a strong interest in serving the 
needs of the information-technology in¬ 
dustry on an international basis. Interna¬ 
tional participation and acceptance will 
be essential in serving the global industry 
in the 21st century. 



Robert Noyce dead at 62 

Robert Noyce, coinventor of the inte¬ 
grated circuit, cofounder of Intel, and 
president and chief executive officer of 
Sematech, died Sunday, June 3, of car¬ 
diac arrest. He was 62. 

Noyce earned his PhD in physics from 
MIT in 1953 and became a researcher at 
Philco before joining the Shockley Semi¬ 
conductor Laboratory. Soon after, Noyce 
and other Shockley engineers founded 
Fairchild Semiconductor, where he 
worked on technology for the integrated 
circuit in what became a patent race with | 

Texas Intruments’ Jack Kilby, who was 
working on the same idea. Noyce and 
Kilby eventually shared credit. 

In 1968 Noyce and Gordon Moore cre¬ 
ated Intel, and in 1988 he was named 
president and CEO of Sematech, a con¬ 
sortium formed to improve US standing 
in semiconductor manufacturing. 

Noyce received the National Medal of 
Technology in 1987 and the National 
Medal of Science in 1980. He was also a 
charter recipient of the Computer Soci¬ 
ety’s Pioneer Award and a member of the 
National Academy of Science, National 
Academy of Engineering, and the Ameri¬ 
can Academy of Arts and Science. 

His wife, Ann Bowers Noyce, has 
asked that contributions be made to the 
American Institute for Learning, 408 
Congress Avenue, Austin, TX 78701, 

Attn. Richard Halpin or Denise Meikel. 



Robert Noyce 
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Women, minorities, and computer science: Putting them all together 


Galen Gruman, Staff Editor, IEEE Software 


The growing need for computer scien¬ 
tists and diminishing interest in com¬ 
puter-science careers among white males 
have focused attention recently on what 
has already been a concern for two dec¬ 
ades: the low percentage of women and 
minorities in computer science and in the 
sciences and engineering in general. 

Education experts, computer-science 
faculty, and government officials point 
to several underlying problems: 

• Science is not made exciting or inter¬ 
esting in schools. Rote memorization of 
facts, not problem-solving techniques, 
predominates precollege curricula. 

• Girls of all races are discouraged 
from pursuing interests in the sciences, 
especially physical and mathematical 
sciences. This discouragement is subtle 
and comes from all quarters: family, 
teachers, and popular culture. 

• Minorities are not challenged to ex¬ 
cel. Again subtle, this seems to be based 
on double standards for minorities, who 
are expected to do worse and thus are al¬ 
lowed to do so. A California study 
showed that minority children entered 
kindergarten almost on par with their 
white peers (the differences were attrib¬ 
utable to poverty), but they lagged a half 
grade behind by third grade, two grades 
by eighth, and three grades by 12th. 

• There are few role models for women 
and minorities at the college level, both 
because there are fewer women and mi¬ 
norities in the sciences and because even 
fewer stay in academia. According to sev¬ 
eral people interviewed, women leave 
academia for two reasons: higher paying 
jobs in industry and the perception that 
they cannot succeed in a tenure system 
marked by subjective, secret evaluations 
and the constant need to prove them¬ 
selves to older male colleagues. (How¬ 
ever, on January 9, the US Supreme Court 
said universities are not exempt from 
civil-rights laws that allow federal re¬ 
view in cases of alleged sexism.) 

There are no quick fixes to the prob¬ 
lems of science education, according to 
those interviewed. 

The long-term need is to raise all stu¬ 
dents’ interest in science and to improve 


education, while also narrowing the per¬ 
formance gap between minorities and 
whites and the interest gap between 
women and men, said Elizabeth Stage, 
executive director of the California Sci¬ 
ence Project. The precollege project fo¬ 
cuses on problem-solving and hands-on 
science, not rote learning, and its evalu¬ 
ations depend less on standardized test 
scores than on performance, she said. 

At the federal level, the National Acad¬ 
emy of Sciences and the National Re¬ 
search Council have convened colloquia, 
issued reports, and promoted math and 
science standards. The National Gover¬ 
nors Association also has made several 
recommendations. Its report. Increasing 
the Supply of Women and Minority Engi¬ 
neers, will be released July 29 and is ex¬ 
pected to list most state programs. 

At the local level, some school districts 
have joined with local businesses to pro¬ 
vide facilities, speakers, and supplies for 
disadvantaged children. Such efforts 
tend to focus on inner-city schools and 
minority students, said Tom DeMarco, a 
software engineer involved with such an 
effort in Maine. 

At all levels of education, “most pro¬ 
grams we now have for disadvantaged 
students are boring,” Stage said. “The 


A number of universities, national 
laboratories, and supercomputer centers 
will begin research into gigabit network 
communications under a $15.8 million 
award announced June 8 by the National 
Science Foundation and the Defense Ad¬ 
vanced Research Projects Agency. 

The three-year award to the nonprofit 
Corporation for National Research Ini¬ 
tiatives will primarily support university 
research, but several industrial sponsors 
will contribute switching, interface, and 
computer technology. The NSF and 
DARPA expect overall industrial sup¬ 
port to exceed the federal award. 

Five testbeds around the country will 


longer they’re in them, the worse off they 
are.” She cited work at San Diego State 
University and at Stanford University 
that shows the effectiveness of programs 
that focus on increasing expectations. 

At the college level, past NSF pro¬ 
grams have focused on funding individ¬ 
ual women and minorities through re¬ 
search assistantships and the like. Such 
programs will continue, but the NSF has 
started a new project to encourage institu¬ 
tions to attract and retain women and mi¬ 
norities at the doctorate level, rather than 
leaving it to individuals to find suppor¬ 
tive environments on their own, said Sue 
Kemnitzer, deputy director of the engi¬ 
neering infrastructure division. 

A popular technique to help women 
and minorities pursue computer-science 
careers is mentoring, where professional 
women help female students and address 
student groups. But a recent National Sci¬ 
ence Foundation report (Women in Com¬ 
puter Science by Nancy Leveson) notes 
that mentors must assume this task in ad¬ 
dition to their teaching, research, and 
family obligations — it is usually not ac¬ 
counted for or rewarded in their jobs. 

[The complete version of this story in 
the July issue of IEEE Software examines 
solutions to these problems — Ed.] 


examine applications requiring gigabit 
networks, such as weather modeling, 
geologic mapping, chemical dynamics, 
and real-time analysis of sensor data. 

The project is also part of the research 
necessary to develop a National Research 
and Education Network, which has been 
proposed in both the Senate and the House 
of Representatives and by the Office of 
Science and Technology Policy. 

The universities include Carnegie 
Mellon, the Massachusetts Institute of 
Technology, Caltech, the University of 
California at Berkeley, the University of 
Pennsylvania, the University of Wiscon¬ 
sin, and the University of Illinois. 


Research on gigabit networks gets funding 
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NEWS FROM THE COMMITTEE ON PUBLIC POLICY 

Trends in East-West technology transfer 

Seymour E. Goodman, Chair, COPP Technology Transfer Subcommittee 


[Editor’s note: The following abstract 
is taken from a statement made before the 
Joint Economic Committee, Subcommit¬ 
tee on Technology and National Security, 
US House of Representatives and US Sen¬ 
ate, on April 20,1990. Goodman made 
the statement as an individual academi¬ 
cian rather than as a representative of 
COPP.] 

Important trends and changes are oc¬ 
curring in the practice of East-West tech¬ 
nology transfer. Most of the observations 
and conclusions in this article are derived 
from studies of Soviet computing over 
the last dozen years, but I suspect that 
many are applicable to other technolo¬ 
gies as well. 

The Soviets have been pursuing US 
computer technology and products for 
more than 40 years. Nevertheless, US- 
USSR gaps in these areas have increased. 
These gaps result from three general fac¬ 
tors that collectively have retarded com¬ 
puter-related technology transfer from 
the West to the Soviet Union for decades: 

• US/CoCom export controls. For ex¬ 
ample, controls by the US and member 
countries of the Coordinating Committee 
for Multilateral Export Controls (Co- 
Com) on the sale of manufacturing know¬ 
how and equipment have contributed to 


Figure 1. Technology transfer mecha¬ 
nisms. 


the inability of the USSR to develop much 
stronger microelectronics and disk-stor¬ 
age manufacturing industries. 

• Limited business opportunities for 
Western companies. For example, re¬ 
strictions on direct access to Soviet mar¬ 
kets, bureaucratic constraints on the con¬ 
duct of business, and limitations on the 
removal of hard-currency profits from 
the USSR have discouraged many West¬ 
ern firms from pursuing business oppor¬ 
tunities in the Soviet Union on a long¬ 
term, profitable basis. 

• Self-imposed constraints. For ex¬ 
ample, the long-standing stranglehold of 
the KGB and the GRU (Soviet military in¬ 
telligence) on technology collection, the 
extreme protectionism for Soviet firms in 
the form of national security controls, the 
lack of a convertible currency, travel re¬ 
strictions, and poor mechanisms for 
internal technology transfer have all ad¬ 
versely affected development of the So¬ 
viet computer industry. 

It is impossible to analyze in detail the 
relative importance of these three fac¬ 
tors. However, I believe that all three 
have been important and collectively 
have severely retarded Soviet develop¬ 
ments. This is illustrated in Figure 1. 

Each technology transfer mechanism can 
be seen as occupying a position on the 


Figure 2. Impact of international trends 
on East-West technology transfer. 


graph. For example, reading the open 
technical literature would be a mecha¬ 
nism located in the overt-passive quad¬ 
rant. Building a turnkey plant with strong 
feedback processes between the provider 
and the receiver would be located in the 
overt-active quadrant. Generally speak¬ 
ing, passive mechanisms are weaker than 
active mechanisms, and covert transfer 
is often riskier and weaker than overt 
transfer. 

If we consider the three major retard¬ 
ing factors, it becomes immediately ap¬ 
parent that all three work together to push 
the practice of Soviet technology transfer 
toward the passive and covert sides of the 
graph. For most of the past 40 years, the 
distribution of Soviet technology trans¬ 
fer practice has been strongly skewed to¬ 
ward the covert-passive quadrant. 

Clearly, this is the least efficient kind of 
distribution. 

It is not difficult to think of significant 
possibilities for progress if one or more of 
the retarding factors were removed or 
greatly relaxed. With that in mind, let’s 
briefly consider four major international 
technical/economic/political trends: 

• Expansion and filling-in of the 
technological spectrum. New materials 
for microelectronic devices and new 
architectures for high-performance com¬ 
puting, for example, now provide a wider 
choice of cost/performance possibilities. 

• Globalization. The spread of ad¬ 
vanced, or near-advanced, technology 
around the world, and especially to Asia, 
has been remarkable. The net result is 
many more non-US and non-CoCom 
sources for an increasing variety of tech¬ 
nologies and products. 

• Commoditization. An increasing 
number of high-technology products are 
adding a very important “commodity” 
phase to their life cycles. The term can be 
used in much the same sense as it applies 
to grain commodities like wheat — that 
is, products produced in very large vol¬ 
umes and at low prices, with strong forms 
of substitutability, spot markets, etc. 

Such products tend to be easy to obtain 
and redistribute in large quantities. 
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• Economic and political reforms in 
communist countries. Much of the So¬ 
viet program of perestroika (and its local 
equivalents elsewhere) is intended to im¬ 
prove the economic and political envi¬ 
ronment for the development and appli¬ 
cation of technology. So, for example, 
travel abroad for scientific and techno¬ 
logical purposes and to try to establish 
joint ventures is being encouraged in the 
USSR. 

It should be obvious that every one of 
these trends weakens one or more of the 
major factors retarding technology trans¬ 
fer. The net result is a significant change 
in the distribution of the practice of tech¬ 
nology transfer and product acquisition 
by the Soviet Union. In particular, the re¬ 
sult is a much increased use of the more 
active and overt (and therefore more ef¬ 
fective) technology transfer mechanisms 
(see Figure 2). 

The first three of these international 
trends are primarily achievements of the 
world outside the USSR. They are, how¬ 
ever, so powerful and pervasive that re¬ 
laxation of the self-imposed constraints 
by the USSR has provided and will con¬ 
tinue to provide an unprecedented flow of 
technology and products into that coun¬ 
try, although this flow remains small by 
West-West standards. On the other hand, 
many of the self-imposed retardants still 
exist and have been exacerbated by the 
Soviet economy’s current shaky and un¬ 
certain state. 

The US dilemma. Some of this flow, of 
course, goes directly into the Soviet mili¬ 
tary and military-industrial sectors. 

There is, however, an unprecedented 
broad-based, modest-technology-level, 
low-cost, and relatively high volume 
transfer into the Soviet general economy. 
Therein lies an awkward problem for the 
US. 

The primary reason the Soviets have 
sought to transfer Western technology to 
their military and military-industrial sec¬ 
tors is because these needed technologies 
were not available from the Soviet gen¬ 
eral economy. In this sense, the US and 
other Western high-technology econo¬ 
mies served as surrogates for what the 
Soviets lacked at home. 

In the days before glasnost and per¬ 
estroika, when export controls were more 
effective than most Soviets would care to 
admit, the other factors combined to 
make it relatively easy to pursue controls 
on dual-use technologies to the extent 
that they severely hurt the Soviet general 
economy, US disclaimers on practicing 
economic warfare notwithstanding. 

Over the long term, this may well have 
been the most important impact of export 
controls and quite possibly a driving 
force that helped bring about per¬ 


estroika. From a fairly narrow military- 
technological national security point of 
view, building up the technology base of 
the Soviet general economy is probably 
the worst thing we could do. 

However, life is more complicated 
these days, and with a broadening percep¬ 
tion of our national security, it is our 
stated intention to try to help perestroika 
succeed. One way to help is to broadly 
support development of the Soviet gen¬ 
eral economy’s technological base. 

Since many of the technologies and prod¬ 
ucts the Soviets need most are dual-use 
technologies such as microelectronics, 
computers, and telecommunications 
equipment, there is no way that we can 
significantly aid perestroika without 
also helping to develop the Soviet Un¬ 
ion’s military-technological capability. 

A positive approach. A partial resolu¬ 
tion of this dilemma is to recognize that 
there are some supply-side considera¬ 
tions, both explicit and implicit, in the 
four international trends listed above. 
These considerations make effective 
control much more difficult than in the 
past. In particular, the rates of growth in¬ 
herent in the first three trends are greater 
than any conceivable growth in the re¬ 
sources that might be devoted to increas- 


The IEEE Computer Society Board of 
Governors has approved the slate of can¬ 
didates for president-elect, first vice 
president, second vice president, and 
seven board posts that will be filled by 
membership vote this fall. The board took 
action on the slate when it met in San 
Francisco June 8 during IEEE Infocom 
90, the ninth Conference on Computer 
Communications. 

Those elected, including the seven 
candidates receiving the greatest number 
of votes among the 12 Board of Gover¬ 
nors nominees, will begin their terms 
January 1, 1991. 

The candidates are as follows: 

1991 president-elect (1992 president, 
1993 past president) 

(One elected) 

Bruce D. Shriver 
Joseph E. Urban 

1991 first vice president 

(One elected for one year) 

Paul L. Borrill 
Gerald L. Engel 


ing controls in today’s political climate. 
Therefore, we should consider taking a 
positive and constructive approach to 
selling many information technology 
products in the hope of helping bring 
about a more open and democratic Soviet 
Union. 

In one important and relevant way, life 
has become more complicated for the So¬ 
viets. The same international trends mak¬ 
ing possible a greater technology flow to 
the Soviet Union have made it much 
harder for the Soviets to establish them¬ 
selves as a truly modem, technologically 
competent, and economically sound 
country. They must now do more to 
achieve this than they needed to do 10-20 
years ago. 

Further information available. For 

additional information, see the Decem¬ 
ber 1988 National Research Council/ 
National Academy of Sciences Commit¬ 
tee to Study International Developments 
in Computer Science and Technology 
book Global Trends in Computer Tech¬ 
nology and Their Impact on Export Con¬ 
trol. It provides more details and a long 
and comprehensive assessment of the 
state of Soviet computing. Information 
collected since its publication provides 
additional support. 


1991 second vice president 

(One elected for one year) 

Mario R. Barbacci 
Barry W. Johnson 

1991-93 Board of Governors 

(Seven elected for three years) 

Fiorenza Albert-Howard 
Joseph Boykin 
Jon T. Butler 
Michel Israel 
Michael C. Mulder 
Yale N. Patt 
Charles B. Silio 
Donald E. Thomas 
Anneliese von Mayrhauser 
Benjamin W. Wah 
Ronald Waxman 
Akihiko Yamada 

Additional nominees can be included 
on the ballot by membership petition (see 
Computer, February 1990, pp. 73-74, for 
a complete list of Computer Society re¬ 
quirements for petition candidates). 

Petition candidates must submit their 
petition signatures (1,000 voting mem- 


Board of Governors approves slate of 
candidates for fall election 
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Software engineering TC takes action on variety 
of issues affecting members and industry 

Peter Freeman, Chair 

Laurie Werth, Vice Chair Education 

Technical Committee on Software Engineering 


bers for officer nominees and 250 voting 
members for Board of Governors nomi¬ 
nees) and their biographical data, photo¬ 
graphs, and position statements to the 
Computer Society secretary at the fol¬ 
lowing address on or before July 31, 

1990: David Pessel, BP Sunbury Re¬ 
search Centre, Chertsey Road, Sunbury- 
on-Thames, Middlesex, TW16-7LN, 
England. 

Candidates’ statements and biographi¬ 
cal data will be published in the Septem¬ 
ber 1990 issue of Computer and will be 
included in the September 1, 1990, IEEE 
ballot mailing. Length limitations for 
these materials are as follows: 

Candidates’ statements 

President-elect — 350 words 

Vice president — 250 words 

Board of Governors — 150 words 

Biographical data 

All candidates — 200 words 

Biographical sketches should cover 
the topics in the following sequence: 

• Computer Society activities 

• other professional activities 

• current employment, professional 
experience, and accomplishments 

• degree(s) and majors(s) 

• awards and honors 

Nominees should also submit black- 
and-white passport-type photographs of 
themselves for publication along with 
their statements and biographies. 


The declining enrollment of computer 
engineering students, the shortfall of 
quality academic programs in computer 
science education, and inequitable repre¬ 
sentation of various societal groups in 
computer science and engineering are 
among the current concerns of the IEEE 
Computer Society’s Technical Commit¬ 
tee on Software Engineering. In addition, 
the TCSE continues to serve members 
and the industry through its ongoing ac¬ 
tivities in conferences, standards, and 
technical education. 

Education. Computer science educa¬ 
tion and software engineering face two 
major challenges: decreasing student en¬ 
rollment and a dearth of quality academic 
programs. Faculty shortages seem to be 
easing slightly, but a national trend to¬ 
ward fewer computer science and com¬ 
puter engineering graduates and under¬ 
graduates does not bode well. In its April 
1989 report, the Congressional Office of 
Technology Assessment estimated the 
deficit of software professionals at be¬ 
tween 50,000 and 100,000, and predicted 
that the deficit will grow. This situation is 
exacerbated by “a serious shortage of rig¬ 
orous software engineering programs in 
US colleges and universities.” 


A number of corrective efforts are on¬ 
going. For example, the TCSE has been 
trying to address the critical issue of edu¬ 
cation in software engineering by coop¬ 
erating in the curriculum efforts of the 
IEEE/ACM, the Software Engineering 
Institute, and the International Federa¬ 
tion of Information Processing. The most 
significant work in defining software en¬ 
gineering curricula is being done at Car¬ 
negie Mellon University’s Software En¬ 
gineering Institute, which plans yearly 
reports on curricula. SEI holds educator 
development workshops for academic 
and industry educators and has devel¬ 
oped a series of curriculum modules that 
can be assembled into a variety of 
courses. A directory of software engi¬ 
neering course offerings at most US uni¬ 
versities is available. 

A software engineering master’s de¬ 
gree program at Carnegie Mellon began 
in September. Several courses have al¬ 
ready been taught and are available on 
videotape through the Videotape Dis¬ 
semination Project. The Technology 
Transition Group has several related 
projects, including planning for technol¬ 
ogy transition and adoption of software 
engineering innovations in organiza¬ 
tions for industry affiliates. The Software 
Process Assessment and Software Capa¬ 
bility Evaluation Projects have outlined 
activities and a methodology for integrat¬ 
ing software engineering into a software 
organization. These reports and materi¬ 
als are available from the Software Engi¬ 
neering Institute, Carnegie Mellon Uni¬ 
versity, Pittsburgh, PA 15213. 

Also substantially affecting computer 
science education are the Computer Soci¬ 
ety/ACM Joint Task Force on the Under¬ 
graduate Curriculum and the joint Com¬ 
puter Society/ACM Computer Science 
Accreditation efforts. The former has 
drawn up descriptions of about 300 lec¬ 
ture hours of knowledge units in 10 broad 
areas. The final report will propose a vari¬ 
ety of curricula that can be constructed 
from these units. On the accreditation 
front, the number of Computer Society 
Accreditation Board and Accreditation 
Board for Engineering and Technology- 
accredited computer science programs 
should grow to about 80 by the end of the 
1990-91 cycle. 

The pervasiveness of software-based 
systems and their potential problems, as 
well as their benefits, to society have 


Office automation TC seeks volunteers 


The IEEE Computer Society 
Technical Committee on Office 
Automation, a regular sponsor of 
the Conference on Office Automa¬ 
tion Systems, is interested in further 
identifying timely and innovative 
topics in office automation and pos¬ 
sibly sponsoring related activities. 
In particular, the TC seeks volun¬ 
teers interested in multimedia data 
in office information systems. 

The TC publishes a newsletter. 
Office Knowledge Engineering, 
that covers current research in infor¬ 


mation management and social sci¬ 
ence technology for offices. Topics 
include multimedia systems, office 
models, knowledge representation, 
management, and workstations and 
desktop management systems. 

Interested persons should con¬ 
tact Vincent Lum, Chair, Code 52, 
Naval Postgraduate School, Mon¬ 
terey, CA 93943, phone (408) 646- 
3091; or David Choy, IBM Al- 
maden Research Lab, MS K56801, 
650 Harry Rd., San Jose, CA 95120- 
6099, phone (408) 927-1846. 
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brought ethics, the teaching of software 
engineering, and the certification of soft¬ 
ware professionals to the fore. In the 
January 1990 issue of Computing Re¬ 
search News, John Chemiavsky of 
Georgetown University reviewed a re¬ 
cently released report by the Subcommit¬ 
tee on Investigations and Oversight of the 
House of Representatives Committee on 
Science', Space, and Technology entitled 
“Bugs in the Program.” Major report rec¬ 
ommendations included a working group 
on software development improvement, 
a NASA-assisted effort to develop and 
implement a software-oriented procure¬ 
ment policy, an NSF focus on the im¬ 
provement of software engineering edu¬ 
cation with stronger ethical norms, and a 
recommendation that the SEI receive 
support from a broader spectrum of fed¬ 
eral agencies. The report warns that fail¬ 
ure in self-regulation by the professional 
societies “may lead to the loss of their 
prized autonomy.” 

Equitable representation. Govern¬ 
ment, industry, and educational organi¬ 
zations have been trying to address the 
inequitable representation of certain 
groups in many social and professional 
organizations. It is now clear that in addi¬ 
tion to the strong moral rationale for pro¬ 
viding equal access, there are economic 
reasons as well. Thus, the focus is cur¬ 
rently on underrepresented social groups 
as obvious candidates to help increase the 
number of students and faculty at gradu¬ 
ate and undergraduate levels. 

While blacks and Hispanics will make 
up almost half the school-age population 
by the year 2000, currently they represent 
only 2 percent of the PhDs in science and 
engineering. Women, who make up half 
the population, constitute only 11 per¬ 
cent of the science and engineering work 
force. 

Nancy Leveson of the University of 
California, Irvine, in her report Women In 
Computer Science: A Report for the NSF 
CISE Cross Directorate Activities Advi¬ 
sory Committee, lists things the NSF and 
others can do to improve the representa¬ 
tion of women (see related article in Com¬ 
puter, June 1990, p. 88). Many of these 
activities are also applicable to improv¬ 
ing minority participation. These and 
other issues are discussed in the TC’s 
quarterly newsletter. 

Conferences. In the past year, the 
TCSE helped sponsor the 12th Interna¬ 
tional Conference on Software Engineer¬ 
ing (ICSE12) in Nice, France, the Fifth 
International Workshop on Software 
Specification and Design (IWSSD-5), 
the CASE 89 symposium, the Annual 
Software Maintenance Conference, the 
Software Process Workshop, and several 


other meetings. TCSE members played 
key roles in organizing these meetings. 

Standards. After conference sponsor¬ 
ship, software engineering standards ef¬ 
forts constitute the TCSE’s largest area of 
activity. This work falls under the Soft¬ 
ware Engineering Standards Subcom¬ 
mittee, directed by John Horch. Nearly 
2,000 people are involved in formulat¬ 
ing, revising, and approving various 
standards. This standards work is ex¬ 
tremely important, since it packages in 
convenient form what we know about 
software engineering, thus facilitating 
the improvement of pragmatic software 
engineering. 

Reorganization. The TC reorganized 
this year. The new structure features six 
vice chairs and a secretary as well as the 
chair. The overall objectives are to pro¬ 
vide some direct, professionally valuable 
service to every member, and to involve 
at least 10 percent of our approximately 
2,000 members in some direct activity 
through which they can contribute to the 
field. The reorganization is intended to 
help reach these objectives. 

Other activities. Two other recent ac¬ 
tivities include the formation of a relia¬ 
bility engineering subcommittee, with 
Frank Ackerman of AT&T Bell Labora¬ 
tories as chair (see Computer, June 1990, 
p. 89) and the development of IEEE fel¬ 
low nominations from the software engi¬ 
neering field. 

Plans are under way to broaden future 
meetings. For example, the next TCSE 
meeting, set for May 1991 in Austin, 
Texas, will offer workshops for small 
groups to discuss topics of mutual inter¬ 
est, a change in format for tools presenta¬ 
tions, and the potential for videotape 
presentations. Additional suggestions 
are solicited. Several initiatives will ad¬ 
dress the need for better software engi¬ 
neering education and training, and for 
strengthening standards activities. 

One of the largest Computer Society 
technical committees, the TCSE’s gen¬ 
eral focus is on activities designed to 
strengthen the discipline and practice of 
software engineering. Its size and the di¬ 
versity of its activities enable it to pro¬ 
vide an umbrella for almost any effort ap¬ 
propriate to a professional society with 
members whose activities range from ba¬ 
sic research to management. 

Readers with a professional interest in 
software engineering are encouraged to 
join the TCSE or to renew existing mem¬ 
bership. There are no dues. Those inter¬ 
ested may contact the Computer Society 
office in Washington, DC, or Elliot 
Chikofsky, Index Tech Corp., One Main 
St„ Cambridge, MA 02142. 


Software in the Real World 

Easy to use is easy to say. In the 
real world, software developers 
can’t assume the user will do as 
they expect or that their software 
will meet the user’s expectations. 
IEEE Software looks at human 
factors from your perspective: 
How do you find out how users 
think? How do you determine 
what they want from a software 
product? Each issue, Kathleen 
Potosnak shows you how to think 
about — and like — a user. And 
why you should. Our Human Fac¬ 
tors department can help give 
your software an ergonomic edge. 
We’re what a technical magazine 
should be: Practical. Authorita¬ 
tive. Lucid. Direct. 

For subscription information, write IEEE 
Software, 10662 Los Vaqueros Cir., PO Box 
3014, Los Alamitos, CA 90720-1264; call 
(714) 821-8380; or use the reader-service 
card. 

Software 

The state of the art 
about the state of the practice. 
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Programmer’s editors — a second look 


Daniel McAuliffe 

In the May 1989 column (pp. 89-93), I 
reviewed a number of programmer’s edi¬ 
tors intended for the DOS environment. 
Two of those editors have recently 
undergone significant upgrades and de¬ 
serve a second look. 


Multi-Edit Professional 

One of the first things you will notice 
when you receive Multi-Edit Profes¬ 
sional, Version 4.00Pb, is the lack of a 
written manual. This clues you in to one 
of its most outstanding new features, a 
hypertext-style help facility. All of the 
documentation for Multi-Edit now comes 
on line, including a complete reference 
manual for the macro language. At first 
the lack of a manual seemed rather awk¬ 
ward, but I soon found the help system 
much more convenient than any written 
document. The cross-referencing and in¬ 
dex capabilities easily outdistance 
thumbing through a manual. For those of 
you who can’t do without the written 
manual, however, Multi-Edit provides 
facilities for printing the help documents. 
The company also plans to release a writ¬ 
ten manual in the near future. 

The ability to configure Multi-Edit 
features without leaving the editor has 
always been one of the program’s out¬ 
standing capabilities, and the new ver¬ 
sion has taken the process to a new level. 
The number of configuration options has 
expanded, and many of the menus have 
improved significantly. I found the selec¬ 
tion of screen colors much more versatile 
and easy to manage. Each filename ex¬ 
tension can now have a default directory. 
For instance, if you wish to store all of 
your .doc files in the same directory, 
Multi-Edit will load and store a file with 
the .doc extension from that directory 
whenever it cannot find the file in the 
current directory and you do not specify 
a path in the filename. 

Multi-Edit now supports different 
compiler configurations for the same 
filename extension. For instance, if you 


use both Microsoft C and Turbo C, you 
can build two separate compiler setups 
for programs with the .c filename exten¬ 
sion. When you wish to compile a pro¬ 
gram, a variable-length list of compiler 
names appears. You can then select the 
compiler of your choice. 

A number of Multi-Edit commands re¬ 
quiring a prompt, such as Search and 
Search/Replace, now maintain a history 
of the last 10 responses to the prompt. If 
you want to repeat a search for an entry 
other than the last, you can bring up the 
history list and simply select the item. 

A multiple file search capability, simi¬ 
lar to the Unix Grep facility, has been 
added to Multi-Edit. If you enter <Alt 
F>, a prompt screen requests the specifi¬ 
cation for the files to be searched, plus 
the search string. The file specification 
allows wildcards, and the search string 
allows regular expressions. After all of 
the specified files have been searched, 
the program displays a list of the files 
containing the search string. Selecting a 
file loads the file into a window, with the 
cursor positioned at the first occurrence 
of the search string. The <Alt G> key¬ 
stroke pops up the list of files, and you 
can then load the next file. You can also 
search files in subdirectories if you wish. 

Multi-Edit has added a Redo command 
to complement the Undo command. 

Redo will reverse your last Undo com¬ 
mand. At first sight the concept seemed a 
little silly, but it rapidly became an indis¬ 
pensable tool. 

You can now perform addition, sub¬ 
traction, multiplication, and division on 
the numbers inside a marked block. The 
result of the operation then moves to the 
current cursor location. 

The DOS directory features have been 
considerably enhanced. Two separate 
display modes are now available, and 
you can display as many as four directo¬ 
ries at the same time. Simple keystrokes 
let you switch between directories, create 
a new directory, and delete a directory. 
The directory displays can also be indi¬ 
vidually sized and moved around the 
screen and can even overlap one another. 


Multi-Edit Professional contains a 
complete communications subsystem for 
use with an existing modem. The module 
is implemented with macros that you can 
customize, although I found the original 
state of the subsystem very satisfactory 
and quite similar to Procomm in its 
command structure. (See the February 
1990 column, pp. 79-87, for a review of 
Procomm.) 

A spell checker with a standard dic¬ 
tionary of more than 80,000 words is 
now available. You can also add two 
auxiliary dictionaries of your own. The 
spell checker always worked fine when I 
used it, but it seemed rather slow com¬ 
pared with spell checkers in larger word 
processors. 

Multi-Edit has so many outstanding 
features, it’s impossible to include them 
all. To appreciate its capabilities, you 
will have to try the program yourself. I 
strongly urge you to do so. I have used a 
number of programmer’s editors in the 
DOS environment over the years, and in 
my opinion Multi-Edit is the easily 
among the best available. 

Multi-Edit Professional, Version 
4.00Pb, is available from American Cy¬ 
bernetics, 1228 N. Stadem Dr., Tempe, 
AZ 85281, phone (602) 968-1945. The 
retail price is $179. A standard version 
of the editor is also available for $99, but 
it does not include the spell checker, 
communications module, or macro 
source code. 

Reader Service 21 

BRIEF 

BRIEF, Version 3.0, is the latest re¬ 
lease of one of the most popular and ca¬ 
pable programmer’s editors available. 

The new version adds a number of sig¬ 
nificant features, including the ability to 
zoom a window to full-screen size and 
back again to the original size. You can 
now save keystroke macros to disk and 
recall them later. Version 3.0 also offers 
smart indenting and template editing for 
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a number of different languages, such as 
Ada, Fortran, Pascal, C, and Basic. 

Template editing refers to the ability 
of the editor to automatically complete 
frequently used language statements. 
Template editing will finish the names of 
some statements and add the appropriate 
braces, parentheses, semicolons, and 
other punctuation. For example, if you 
are programming in the C language and 
type the character “d” followed by a 
space bar, the editor will expand the line 
to look like 

do 

{ 

X 

) 

while (); 

and position the cursor where the X 
appears. 

The most significant change between 
versions 2.11 and 3.0 of BRIEF involves 
the macro language. BRIEF continues to 
support the original Lisp-like macro syn¬ 
tax, so you can continue to use any mac¬ 
ros you might have written, acquired 
from bulletin boards, or obtained from 
other vendors. BRIEF also supports a 
new macro language called CBRIEF, 
whose syntax resembles the C language. 
This welcome improvement makes the 


process of macro development — at least 
for the C programmer — much easier. 

CBRIEF can be viewed as a subset of 
the C language, with the following dif¬ 
ferences: 

(1) CBRIEF supports only integer and 
string data types. The concept of type- 
defs has no meaning in CBRIEF, and it 
does not support pointers, structures, or 
even arrays. 

(2) The “break” statement assumes a 
limited meaning in CBRIEF. It can only 
be used to terminate a “for,” “do,” or 
“while” statement and cannot be used in¬ 
side a “switch” statement. The language 
does not allow for the flow of control 
from one switch case into another, there¬ 
fore a “break” is always implied. 

(3) CBRIEF has no “goto” statement, 
will not let you use the “#define” direc¬ 
tive with parameters, and does not sup¬ 
port the “sizeof ’ operator. 

These and other differences should not 
significantly affect your ability to pro¬ 
duce very powerful macros. 

You can convert any of your older 
BRIEF macros to the new CBRIEF 
macro language with a utility supplied on 
the distribution disk. The Operates utility 
will produce a macro source file with the 
original set of comments intact and the 


indentation levels unchanged. The new 
macros can then be recompiled from the 
new source file. Since the conversion 
utility does very little error checking, 
you must make sure you have fully tested 
and debugged the macro you want to 
convert. 

The documentation package has 
changed little from version 2.1. It con¬ 
sists of a user’s guide, a macro language 
guide, and a quick-reference guide. The 
user’s guide contains an extensive tuto¬ 
rial to help you get started with the pro¬ 
gram. A command reference section lists 
each of the BRIEF commands along with 
the default key assignment and an expla¬ 
nation of the command function. The 
macro language guide has been expanded 
to cover both the original language and 
the new CBRIEF language. 

BRIEF remains an outstanding editor 
for professional programmers. With the 
addition of the CBRIEF macro language, 
the C programmer at least will find the 
process of macro development a much 
smoother process. 

BRIEF, Version 3.0, is available from 
Solution Systems, 541 Main St., Suite 
410D, South Weymouth, MA 02190, 
phone (617) 337-6963. It retails for 
$199. 
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A new implementation of a solid old text formatter 


Richard Tenney 

SoftQuad Publishing Software 
(SQPS), marketed by Mortice Kern Sys¬ 
tems, derives from AT&T’s Troff, one of 
the oldest text formatters around. Like 
most Unix systems that include Troff, 
SQPS includes versions of various pre¬ 
processors: Eqn to produce mathematical 
equation displays and Tbl to produce 
tabular displays. In addition, it includes 
the less common Pic for producing line 
drawings and Grap for graphs. To say 
that SQPS derives from Troff does not 
convey the full story: an upgrade of that 
product, it can successfully process old 
Troff files. 


Why consider a text 
formatter? 

It might help at this point to differenti¬ 
ate between word processors and text 


formatters. Essentially, a word processor 
combines the functions of an editor and a 
formatter into one interactive package, 
normally giving you some approximation 
of what the final, printed document will 
look like. You never see the internal rep¬ 
resentation of the document. A text for¬ 
matter, on the other hand, generally runs 
in a batch mode, processing a file that 
combines text with embedded formatting 
commands. Typically, users of format¬ 
ters print their documents several times 
before they are satisfied with the results. 
Consequently, one wonders why, in the 
day of fancy word processors, anyone 
would tolerate a text formatter. 

Let me, a constant user of text format¬ 
ters, try to answer that question in my 
idiosyncratic way. First of all, I have 
learned well over 20 different editors in 
my years of programming and writing. I 
have finally found one that is available 
on all the machines I use, and I have de¬ 


cided to stick with it. If I were to use a 
word processor, I would have to learn yet 
another editor. Worse, if I were to use 
word processors on several different ma¬ 
chines, I would have to learn several dif¬ 
ferent editors and several different sets 
of formatting commands. 

Second, when dealing with large docu¬ 
ments (like a book), many word proces¬ 
sors simply run out of steam. Often, they 
require that the text comprise only one 
file if you want to produce an index or 
table of contents, then they limit the size 
of the file. Sometimes, the index or table 
of contents feature fails when the book 
becomes too long. The interactive fea¬ 
tures suddenly become intolerably slow. 
Imagine the computation required to re¬ 
paginate a two-hundred-page document 
when you insert or delete a word near the 
beginning. Of course, you can turn re¬ 
pagination and other complex features 
off, and pretty soon you have converted 
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your word processor into a text formatter. 

Third, the text processed by the for¬ 
matter is an ordinary text file, so word 
count programs, spelling programs, text 
analysis programs, etc. can all process 
the text. Since the text contains no spe¬ 
cial characters, it can safely be sent over 
networks. 

Finally, just as real programmers don’t 
write Pascal, real documenters don’t use 
word processors. 

SQPS product features 

The SQPS version of Troff, called 
Sqtroff, performs better than any other 
version of Troff I have used. In produc¬ 
ing a large document last year, I found 
that I had to make various allowances for 
failings in the Troff system I was using 
(mainly failures to align certain things). 
Running the same text through the SQPS 
system, I found that I had to remove the 
little adjustments I had made, since the 
system actually performed as advertised. 

I tested the formatter, the table processor 
Sqtbl, and the equation processor Sqeqn, 
but not the graph processor Sqgrap or the 
picture processor Sqpic. 

Users already familiar with Troff 
might prefer to invoke the various pre¬ 
processors explicitly or with an existing 
Make file (modified to account for 
SoftQuad’s names, which begin with 
“sq”). However, SoftQuad provides a 
driver, Tr, that, with suitable arguments, 
will invoke the preprocessors in the ap¬ 
propriate order. Furthermore, Tr will in¬ 
voke the required postprocessors that 
convert SoftQuad’s intermediate lan¬ 
guage, called “Context” (the output of 
Sqtroff), into commands to drive the cho¬ 
sen printer. For example, Sqdps converts 
Context into Postscript. If you use a 
Hewlett-Packard Laserjet, you can use 
either the B or the F cartridge, or you 
might prefer to use the SoftQuad pro¬ 
gram that downloads the provided Com¬ 
puter Modern fonts as needed. The 
downloading seems quite intelligent and 
goes quickly. 

Most text formatters are rarely used 
without macro packages to enhance their 
capabilities. Because of its age, Troff has 
quite a few sets of macros, some of them 
well known and extensively used. Sev¬ 
eral macro packages come with SQPS, 
including the Man macros for producing 
Unix-style manual pages and the Mm 
macros, originally meant for producing 
formal memoranda and papers at Bell 
Labs, but also usable for simple letters 
and, suitably modified, for entire books. 
Other macro sets aid in producing a per¬ 
muted index and transparencies. 

One of the manuals describes the Mm 
macros in great detail. Clearly, the com¬ 


pany intends you to use this package. 
The macro packages come in two forms: 
with and without extensive comments. 
The version with comments will help the 
novice who wants to learn the subtleties 
of Troff and the expert who wants to 
modify the way the macro package be¬ 
haves. The version without comments is 
the one generally used, as it can be pro¬ 
cessed much more quickly. Anyone who 
uses previously developed macros will 
probably have to turn off the warnings 
that Sqtroff generates when no space 
separates a request and its argument. 
Anyone developing new macro packages 
will appreciate the enhanced error mes¬ 
sages and the debugging and tracing fa¬ 
cilities of this system. 

Now let me get to some of the annoy¬ 
ances. First, let’s talk about the installa- 


The SQPS version 
of Troff, 
called Sqtroff, 
performs better 
than any other 
version of Troff 
I have used. 


tion. MRS provides an Install program 
that manages most of the installation. It 
has some bugs, though. For example, it 
prompts “At what device is your printer 
connected? ... Leave this blank if you 
want your output to [sic] the standard 
output.” However, it refuses to go on if 
you leave the field blank. It refused to 
install the HP Computer Modem Fonts, 
saying “This package requires 4280 
blocks (2191360 bytes) of disk space 
free ... c:/usr/sqps has only 4104 blocks 
(2101248 bytes) free disk space.” How¬ 
ever, MKS’s own Df program showed 
8,208 blocks free, and MS-DOS’s Dir 
showed 4,202,496 bytes free. Eventually, 
I installed these fonts using the Cpio pro¬ 
gram that came with the system. From 
this you may correctly infer that SQPS is 
a big system. Its actual size depends on 
the number and kind of printers you in¬ 
tend to support. The basic package, with 
support programs and Nroff driver, takes 
almost 2.5 Mbytes of disk storage. The 
Postscript supplement, with previewer, 
requires just over 500 Kbytes, but the HP 
Laserjet Plus supplement with its Com¬ 
puter Modem Fonts requires more than 
3.5 MBytes of disk. 


The previewer gives you an idea of 
what the final page layout will look like 
without printing it. Unlike some other 
systems, the previewer does not allow 
you to select part of the page to enlarge, 
so if you want to see fine detail, such as 
whether two characters happen to touch, 
you will have to print the page. Further¬ 
more, you must install the previewer 
manually, altering the configuration file. 
If you then install a new printer using the 
Install program, it will overwrite the cus¬ 
tomized configuration file without warn¬ 
ing. Similarly, if you intend to support 
both HP Laserjet printers and Postscript, 
the configuration file will reflect only the 
last one you install. You will have to fig¬ 
ure out (from skimpy documentation) 
how to modify this file. 

The manuals are another mixed bless¬ 
ing. Having used Troff for years, I can 
appreciate the amount of work that these 
new manuals represent. They improve 
considerably over the original AT&T 
manuals in format, organization, and 
content. However, they contain errors 
and omissions, and it is sometimes diffi¬ 
cult to find exactly what you are looking 
for. Of course, my requirements might be 
more sophisticated than those of most 
users, because I have used Troff for so 
long. However, some of the errors are 
just silly. My favorite is this: In discuss¬ 
ing the Zapf Dingbats character set 
(I did not invent this name!), the docu¬ 
mentation for the Postscript driver con¬ 
fusingly names the symbol for clubs () 
“SPADES” and the symbol for spades () 
“CLUBS.” 

Because MRS only markets SQPS, 
some surprising lapses occur. First of all, 
some of the documentation cross-refer¬ 
enced in the delivered documentation is 
missing. You do not receive Getting 
Started, nor the Product Overview men¬ 
tioned in the Release Notes. Nor does 
MRS include the volume Text Format¬ 
ting —An Introduction. You can report 
errors in the main manuals to MRS, but 
they merely pass them on to SoftQuad. 
Similarly, MRS technical personnel, nor¬ 
mally quite helpful with their Toolkit 
products, cannot give you much help 
with SQPS. This results in questions that 
remain unanswered for long periods. 

Finally, the biggest problem I have 
with the package is this: There are actu¬ 
ally two versions of Troff, one for driv¬ 
ing photo typesetters and laser printers 
with multiple fonts and all sorts of fancy 
options, and the other, known as Nroff, 
for driving simple, fixed-character-width 
printers. Substitute strings replace com¬ 
mon characters that the simple printer 
cannot make. So, for example, the two- 
character string “<=” might substitute for 
the character “[<].” With a little care, 
you can often process the same text file 
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with either version of the program. 

Thus, you can use a simple printer for 
early drafts of the work, graduating to a 
fancy printer near the end. 

Unfortunately, SoftQuad’s support for 
Nroff appears half-hearted. It leaves out 
almost all of the interesting characters 
from the device description file, produc¬ 
ing a warning when you try to print one 
of them and leaving you the rather 
messy task of providing descriptions of 
those characters you need. 


Summing up 

Let me qualify SQPS this way: This is 
a nice software package with good docu¬ 
mentation. You will need an AT-class 
machine with a full 640 Kbytes and a 
hard disk drive along with DOS 3.0 or 
higher. A RAM disk is highly recom¬ 
mended. You might find the package 
somewhat expensive when you consider 
that, in addition to the base price of $495 


(single-user DOS license), you pay $200 
each for the HP LaserJet and Postscript 
drivers. Still, I found only a few flaws, 
greatly outweighed by the quality of the 
product. 

Contact Mortice Kern Systems, 35 
King St. N., Waterloo, Ontario, Canada 
N2J 2W9, phone (519) 884-2251 or 
(800) 265-2797 (for orders), fax (519) 
884-8861. 
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Using a tape drive for backup 


T.L. (Frank) Pappas 

If you have a large amount of disk 
storage, backing up your files onto 
floppy disks can take too much time, es¬ 
pecially if you create or modify many 
files every day. Mountain Computer 
speeds up this process with their fast and 
reliable 8000+ tape drive. The manufac¬ 
turer claims a data rate of 500 Kbps if 
you connect the 8000+ to your floppy 
controller, and 1 Mbps if you use the 
Mach2 MT-101 controller board. The 
company claims a hard error rate of 1 in 
10 14 bits read and a soft error rate of 1 in 
10 6 seeks. You can format tapes for 124 
MBytes of storage, or 152 MBytes if you 
use the Mach2. 

In using the 8000+ with the Mach2 
board, I backed up several different par¬ 
titions from my hard disk (a 28-ms Priam 
ID330) with varying results. I used a Ze¬ 
nith 386 system at 16 MHz. On a 15.6- 
MByte partition with 643 files, the files 
were written to tape at a rate of 5.2 
MBytes/minute. On a partition with 14.4 



Mountain Computer’s Mach2 controller boards 


MBytes and 266 files, and a parti¬ 
tion with 28.6 Mbytes and 354 
files, the rates were both 5.4 
MBytes/minute. However, on a 
partition with 18.7 MBytes and 
1,668 files, the rate dropped to 4.5 
Mbytes/minute. 

Verifying that the data was writ¬ 
ten correctly drops the backup rate 
to half, so the rate for the first par¬ 
tition is 2.6 MBytes/minute. You 
also have to allow time for posi¬ 
tioning the tape and updating the 
tape volume table, which can add 
another 1.5 to 2.5 minutes. For 
the first partition, this lowered the 
backup rate from 2.6 to 2.0 MBytes/min¬ 
ute. Even at this rate, it only took 7.7 
minutes to back up the partition. A good 
disk-based backup program will take 
longer, and you’ll have to sit around 
changing disks. 

All of the times given above were for 
a formatted tape. With the Mach2, you 
can format while writing, 
which added 4 minutes to the 
backup time for the first parti¬ 
tion. I prefer to format the 
tapes before I use them, since I 
want to ensure that the tape is 
in good condition before I start 
using it. It took an hour to for¬ 
mat the tape. 

The backup program is so 
friendly to use that I performed 
a variety of backups and re¬ 
stores without reading the man¬ 
ual. The manual, which could 
be better written, was the only 
really negative aspect of the 
8000+. The print was so small, 

I found it difficult to read for 
more than a few minutes a 
time. The presentation style 
also makes the manual a little 
difficult to use as a reference. 



Mountain Computer’s 8000+ tape drive 


Since the program is so easy to use, 
that’s not a problem. 

The software provides a great deal of 
flexibility. You can back up and restore 
all of your files, or do so selectively. 

You can even have a file that contains 
the names of files you wish to exclude. 
The names can include wildcards. If you 
have allocated files to your partition in a 
particular arrangement, you can preserve 
it by requesting a DOS image backup. 
Each backup on the tape can be labeled, 
so it’s easy to keep track. You get a host 
of other features as well, all of which 
you can select using batch commands if 
you wish. 

The 8000+ comes in both internal and 
external versions. The external version 
can be configured for horizontal or verti¬ 
cal placement. I used the vertical instal¬ 
lation, so the 8000+, which is 2.7 inches 
wide, 11.2 inches long, and 5.2 inches 
high, takes up little room on my desk. 

If you need a tape backup system, I 
strongly recommend the 8000+ with the 
Mach2 board. Together they list for 
$1,300. Contact Mountain Computer, 

360 El Pueblo Road, Scotts Valley, CA 
95066, phone (408) 438-6650. 
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Three-dimensional animation 


Edward Gordon 

Acrospin, by Acrobits, is an interac¬ 
tive program that provides three-dimen¬ 
sional animation for specified objects. 
Once an object is displayed, you can dy¬ 
namically rotate it through two different 
translations, scale it, and pan over it in 
its current plane. You can specify the ob¬ 
jects, which consist of line segments and 
points, to be any combination of colors. 

Since an object consists of points 
drawn in three dimensions, plus line seg¬ 
ments connecting these points, the files 
that describe these objects contain point 
and line descriptions. You can import 
these descriptions from AutoCAD, 
Cadkey, DesignCAD, Image-3D, and 
other currently available drawing tools 
that provide objects as collections of 
lines and points. If you do not want to 
use a CAD program to draw the object, 
then you can hand-code it or have the 
program generate it. 

The object descriptions consist of lines 
of ASCII text. The inclusion of a few 
Pascal programs that you can use to gen¬ 
erate objects makes learning how to do 
so easy. Some of the modules can be 
used as output routines for graphical rep¬ 
resentation of your mathematical models. 
You can view these models from all pos¬ 


A white mouse 


T.L. (Frank) Pappas 

Mouse Systems’ new White Mouse, 
which is mechanical, has three buttons. 
I’ve used Mouse Systems’ optical mouse 
for several years and thought that I’d 
never like a mechanical one. I was 
wrong. This mechanical mouse is a 
pleasure to use. 

The White Mouse fits nicely in your 
hand, making it easy to maneuver. With 
three buttons, it can operate using the 
Mouse Systems (three-button) protocol 
or the Microsoft (two-button) protocol. A 
switch underneath the mouse selects 
which protocol it should boot up under. 
Therefore, if you want to use the mouse 
with programs already configured for the 
Microsoft Mouse, that’s not a problem. 
That’s especially nice for programs like 
Windows or Excel that must be rein¬ 
stalled and reconfigured to change the 
mouse driver. 

The base resolution of the White 
Mouse is 350 CPI (counts per inch). Nor¬ 
mally, the resolution of a mouse is fixed, 


sible angles, at any granularity of detail, 
and with any part or parts showing at any 
given time. With a little bit of ingenuity, 
you can make animated presentations by 
adding an input front end. 

The program uses the “fastest video 
driver routines ever written for the IBM 
PC” says the vendor. When I tested it on 
an AT, it provided smooth transitions 
from one orientation to the next. Erasure 
of the current view and redraw of the 
new transformation of the object occur 
instantaneously and undetectably. With a 
sufficiently complicated object, the re¬ 
draw time might be noticeable, but I 
couldn’t create any objects where this 
happened. The speed of display varies, 
and you can make rotation as slow or as 
quick as you want. The available speed 
depends only on the object. On most ob¬ 
jects, using the faster settings produced 
rotations quick enough to be indiscern¬ 
ible with current display technology. 

The feature called layering deserves 
special mention. Layering allows an ob¬ 
ject to consist of one or more layers or 
portions of the drawing. Each part of the 
object can exist in all three dimensions 
and can be selectively made visible and 
invisible. For example, if you drew the 


human body, you could make the skele¬ 
ton, the musculature, the organs, the 
lymphatic system, etc., into separate lay¬ 
ers and selectively display them to show 
the interaction of the various systems 
within the body. 

The program comes bundled with 
Acrotran, Acrosign, a tutorial, several 
demonstration objects, and several Turbo 
Pascal programs for generating 3D ob¬ 
jects. The Acrotran program translates 
CAD-generated .dxf files to Acrospin 
format. The extensive and complete tuto¬ 
rial helped me learn how to use the 
program’s features in less than an hour. 
Acrosign, a program written in Pascal, 
takes textual input and creates a 3D sign 
for use with Acrospin. 

Acrospin is produced by Acrobits, 

P.O. Box 5563, Redwood City, CA 
94063-0563, phone (415) 369-0838. The 
suggested retail price is $30. The pro¬ 
gram runs on an IBM PC or clone and 
requires DOS 2.1 or later and 80 Kbytes 
of memory. No special graphics adapter 
is necessary, because it conforms with 
CGA, EGA, MCGA, VGA, or Hercules 
standards. 
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so if the mouse moves one 
inch, the cursor moves its 
base resolution — 350 
counts for the White Mouse. 

But Mouse Systems pro¬ 
vides nine acceleration lev¬ 
els that lets the distance 
moved depend on the speed 
with which you move the 
mouse. In particular, a flick 
of your wrist can move the 
cursor from the top of the 
screen to the bottom. 

The lowest acceleration 
level would be used with 
low-resolution graphics and text (such as 
CGA 320x200). The highest acceleration 
level would be used with high-resolution 
graphics (such as VGA 640x480). You 
can easily adjust the acceleration by 
pressing the Ctrl-Alt keys and the left 
mouse key. The resulting beep indicates 
that you should enter a digit from 0 to 9 
for the acceleration level. Entering 0 


turns off the acceleration. You might 
want to turn it off if the software you use 
has its own acceleration built in. If the 
supplied acceleration levels are not ade¬ 
quate, software is provided for altering 
the acceleration curves of the available 
levels. 

Software supplied with the White 
Mouse allows you to provide mouse sup- 
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port for software that does not have it, by 
writing and compiling menu programs. 
Menu programs are provided for all the 
standard mouseless programs, but the 
text editor that I use isn’t among them. 

My editor has pop-up menus driven 
from a main menu, so it only took me a 
few minutes to create a menu program 
for it. I defined one key as the Enter key, 
another to mark the beginning of a re¬ 
gion for cutting and copying text, and a 
third to invoke the main menu. The only 
other feature I needed to add to the menu 
program was cursor movement. If my 
editor did not have pop-up menus, it 
wouldn’t be difficult to provide them as 
well. 

PC Paint Plus and Power Panel also 
come with the White Mouse. The former 
is a standard type of paint program, so I 
won’t discuss it. Power Panel, however, 


is a nice menu-driven program for con¬ 
trolling your DOS environment. It comes 
configured to invoke standard DOS ser¬ 
vices such as format, backup and restore, 
and chkdsk. During installation it 
searches your hard disks for programs 
that it knows about and builds menus for 
them. Power Panel found my Norton 
Utilities and Fastback Plus programs and 
placed them under a utilities menu. Simi¬ 
larly, it found Kermit and my fax soft¬ 
ware and placed them under a communi¬ 
cations menu. You can create your own 
menus and submenus as well. 

Predefined entries such as the DOS 
services menu, the calendar, the calcula¬ 
tor, and the notepad, appear in a func¬ 
tion-key menu at the bottom of the 
screen. User-defined menu entries are se¬ 
lectable via a single-character keystroke, 
defined when you create the menu. The 


character is displayed with the menu en¬ 
try. You can select either menu item by 
double-clicking. 

Communication software is provided 
with Power Panel. The software lets you 
build a telephone directory to automati¬ 
cally dial numbers. Power Panel also has 
a directory program that lets you look at 
the directory tree. Double-clicking an en¬ 
try opens up the directory. Then you can 
copy, move, edit, or view the directory 
files. 

I really like the White Mouse, and I 
think you will too. It retails for $99.95, 
including the software. A bus version, 
which provides an extra serial port on a 
card, retails for $119.95. Contact Mouse 
Systems, 47505 Seabridge Drive, Fre¬ 
mont, CA 94538, phone (415) 656-1117. 
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Full-featured fax at an affordable price 


Sorel Reisman 

Fax96 With 1 Liner (Version 2.04) is a 
PC-bus (8-bit) facsimile board with sup¬ 
porting software for PC compatibles. 
There are two versions of this board, one 
with the “1 Liner” feature and one with¬ 
out. If you purchase the board with the 
one-line feature, you can use the board in 
conjunction with your answering ma¬ 
chine — the devices share the same tele¬ 
phone line. If a voice call comes in, the 
answering machine operates normally; if 
a fax comes in, Fax96 disconnects the 
answering machine and initiates the fax 
receiving process. Of course, you can in¬ 
stall your modem on this same single¬ 
telephone-line configuration. 

Installation of the board requires set¬ 
ting jumpers for the I/O port address and 
IRQ (interrupt request) that the board 
will use with your computer. The infor¬ 
mative manual, which is chatty and cas¬ 
ual in tone, makes this a painless and 
easy process. You can be up and running 
quickly if you have DOS 2.1 or higher on 
your PC or compatible, as well as 384 
Kbytes, a hard drive, a floppy drive, and 
a graphics display adapter (Hercules, 
CGA, EGA, or VGA). 

Installation of the software, although 
also easy, requires customization of your 
virtual fax machine. For example, you 
can set up a cover letter containing your 
name, company name, etc. that will be 
included with every fax you send. During 
installation you can select your printer 
(the new version of the software has a 


more extensive list than the previous re¬ 
lease) and scanner, if you have one. (The 
scanner list is still very limited and 
doesn’t contain any of the inexpensive 
handhelds.) 

The program files install into one sub¬ 
directory, creating another called the 
“Fax Storage Directory.” This provides 
a handy and convenient way to keep the 
data files separate and distinct from 
program files. You will find this a par¬ 
ticularly useful concept because that 
subdirectory needs to be accessed by the 
applications that you use to create the 
files you will send as well as those that 
you receive. 

Creating and sending faxes is straight¬ 
forward. The software recognizes ASCII, 
PCX, and TIFF (packed, uncompressed, 
or CCITT) file formats and converts 
them to a fax file format at the time of 
sending. It’s too bad that the package 
doesn’t include a utility for converting 
received fax format files back to PCX or 
TIFF, but maybe next time. Currently, 
the software determines the file format 
of files you want to send, based on the 
file extension you use when you create 
the file. 

This latest release of the software also 
includes the ability to send a number of 
faxes at one time, more or less automati¬ 
cally. For example, you can specify a list 
of files to send sequentially; you can also 
create and maintain a fax “broadcast” list 
for sending to a list of receivers. (If you 


do decide to update your software, even 
though the new version overwrites the 
files on your fax subdirectories, your 
previously created phone books and logs 
are maintained.) 

I thought the user interface both color¬ 
ful and interesting. Although the manu¬ 
facturer says the screen is designed to 
emulate a fax control panel, they have 
greatly improved on any panels I have 
ever seen. Of course, CGA, EGA, or 
VGA offers more opportunity for crea¬ 
tivity than does the standard LCD dis¬ 
play available on most fax machines. 

You can install the program as a termi- 
nate-and-stay-resident (TSR) program if 
you wish. If you are working on an appli¬ 
cation and the phone rings, a menu ap¬ 
pears on the screen to allow you to an¬ 
swer the phone or begin receiving a fax. 
In practice, you might answer the phone, 
learn that a fax will be coming, choose 
the receive option, and sit back while the 
fax file is captured to the Fax Storage 
Directory. When the completion message 
appears, you can go back to your main 
application. 

If you run the program in dedicated 
mode (which you must to send a fax), the 
screen displays colored “buttons” that 
move in and out in a kind of animated 
3D fashion. If you select an option, for 
example, by typing “G” to select the 
“Go” button, another screen appears with 
another set of moving buttons. Selecting 
buttons allows you to move through a 
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tree structure of menu screens, each dis¬ 
playing the options available for the but¬ 
tons shown on that screen. Although this 
description might sound strange, in actu¬ 
ality you soon get the hang of the pro¬ 
cess. It seemed really foolproof. I always 
found myself at the next appropriate 
screen, being prompted to do the next 
logical process, either for sending or re¬ 
ceiving a fax. 

I first tested the product by faxing the 
warranty file that Frecom includes as 
part of a tutorial/demo. To be honest, I 
wasn’t really sure what I was doing and 


had to trust the status messages that in¬ 
formed me a fax was being sent and when 
the process finished. However, I prepared 
a number of files using an ASCII word 
processor and a paint program (TIFF for¬ 
mat) and successfully sent them to an¬ 
other (real) fax machine. I also received 
some faxes in both background (TSR) and 
dedicated mode. One of the options avail¬ 
able in dedicated mode is the ability to 
view the bit-mapped version of the sent or 
received fax. That process is a little slow 
(I used a 10-MHz AT clone with EGA 
graphics), but the image was very read¬ 


The infinite font cartridge 


Richard Eckhouse 

Recently, I had a chance to look at 
Publisher’s Type Foundry (PTF) from 
ZSoft. Having used and enjoyed ZSoft’s 
PC Paintbrush Plus for Windows, I knew 
from experience that PTF would be an 
interesting and challenging addition to 
my cadre of graphic utilities. 

In a nutshell, PTF is a font and symbol 
editor that runs under Windows. It con¬ 
sists of three separate parts — the bitmap 
editor, the outline editor, and the utilities 
for translating and downloading fonts 
(for the HP Laserjet or Postscript printers 
as well as for Ventura and Pagemaker). 
Since you might want to scan in an exist¬ 
ing font, you will appreciate the inclu¬ 
sion in this package of PC Paintbrush 
Plus for Windows. 

The idea behind a type foundry is that 
you can use it to either modify existing 
fonts or create new ones. The bitmap edi¬ 
tor lets you view the individual picture 
elements or pels that make up a charac¬ 
ter. The outline editor lets you work with 
characters in terms of their shapes. As a 
result, the outline editor allows you to 
store character sets as mathematical for¬ 
mulations not tied to any display resolu¬ 
tion. The outline form can then be con¬ 
verted to a bitmap representation in any 
size and resolution, a process called “re¬ 
alization.” You can then touch up the re¬ 
alization in the bitmap editor to produce 
the best-looking font for a given point 
size. 

While many of us would just as soon 
leave this process up to the vendors of 
the desktop publishing and soft font 
products we use, PTF does offer capa¬ 
bilities for the more adventurous. For ex¬ 
ample, you can create special symbols 
that might not exist in the standard pack¬ 
ages. These might include the copyright 
or registered symbols, but they could 


also be special logos. In addition, you 
could use PTF to create particularly large 
(or small) fonts. Whatever the applica¬ 
tions, PTF provides an interesting and 
easy method for accomplishing the task. 

PTF, being a Windows product, uses 
the familiar collection of icons and pull¬ 
down menus. Each of the editors has a 
different set of menus because you are 
essentially dealing with quite different 
objects. 

The manual, while seemingly a bit dis¬ 
organized, contains all the necessary 
documentation as well as tutorials to get 
you through the learning process. The 
manual comes in three sections corre¬ 
sponding to the three products bundled 
together. There was a healthy addendum 
for the version 1.22A I reviewed. I 
would have preferred a revised manual, 
or at least the ability to insert revision 
pages where appropriate. Each manual 
section is separately indexed. When you 
register the product, you are offered a 
free videotape that demonstrates the es¬ 
sentials of both editors. 

The eight tools in the bitmap editor in¬ 
clude Line, Pencil, Paintbrush, Fill, 

Filled Polygon, Cut/Paste, Tape Mea¬ 
sure, and Zoom. They operate, for the 
most part, as their corresponding icons 
would in a paint program. Each has its 
own cursor, so you know which tool has 
been selected. 

I found one tool really fascinating. 
Called the Gadget Box, it comes up with 
Cut/Paste and allows you to rotate, scale, 
flip, and transform characters. Every 
paint program should have one of these 
because it makes it so easy to twist, 
stretch, and make symmetric any bit im¬ 
age. This alone makes the product quite 
valuable to me because, even if I am not 
interested in creating fonts, I can build 


able. Of course, the laser-printed hard¬ 
copy image was much better. 

Fax96 is a solid product with well- 
designed supporting software and a com¬ 
plete and readable manual. I thought it 
very reasonably priced at $195. Thus, I 
recommend the product as a useful pe¬ 
ripheral that extends your PC’s utility 
beyond the RS-232 world of data com¬ 
munications. Fax96 is available from 
Fremont Communications, 46309 Warm 
Springs Blvd., Fremont, CA 94539, 
phone (415) 438-5001. 
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up a symbol, use the Gadget Box to ma¬ 
nipulate it, and then cut/paste the results 
into the Window’s clipboard for use by 
another program. 

The tools for the outline editor include 
Add Line, Add Curve, Shape, Select, 
Knife, Move, Measure, and Zoom. The 
Shape, Select, and Knife tools help you 
remove the many short line segments 
that result when you convert a bitmapped 
character into its equivalent outline form. 
As in the bitmap editor, each tool has it 
own unique cursor. When using these 
tools, you begin to understand the differ¬ 
ences between outline and bitmapped 
fonts and how to tailor each character so 
it looks its best when printed. 

Using the bitmapped and outline edi¬ 
tors is both fun and interesting. With a 
little practice I found I could use the 
product to produce some interesting 
typefaces from the scanned image of my 
letterhead. Clearly, this product is much 
more versatile than other typeface prod¬ 
ucts offering only canned fonts and fixed 

The recommended configuration for 
this software is an AT-class machine, a 
hard disk, a color monitor, a mouse, and 
Windows version 2.0 with at least 512 
Kbytes of EMS memory. I did find that I 
could not run it under Windows 3.0 in 
real mode because the icons were not 
visible. 

PTF lists for a hefty $545, so you 
won’t want to buy it unless you really 
need it. But if you do need it, the price is 
really not important, since you can’t get 
this kind of capability without hiring a 
professional. 

Contact ZSoft, 450 Franklin Rd„ Suite 
100, Marietta, GA 30067, phone (404) 
428-0008. 
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TK Solver Plus for the Macintosh 


Giovanni Perrone 

Let me start right out by telling you 
this review will be short (and sweet?) for 
a very simple reason: TK Solver Plus 
Version 1.1 (I’ll call it TK from now on) 
is clearly superb software that fills a 
unique niche. It’s reasonably priced, 
friendly enough for a novice, and sophis¬ 
ticated enough for a professional. It’s 
powerful, accurate, versatile, and ex¬ 
tremely useful. Simply put, TK provides 
one of the best software values available 
today — for anyone who needs to crunch 
numbers. A need (or desire) to crunch 
numbers is a basic assumption. If you 
don’t crunch numbers, then TK probably 
won’t move you. 

There is nothing like a good first im¬ 
pression, and that’s how TK comes on — 
strong. Its power and value become in¬ 
stantly obvious. By the time you have 
finished the excellent HyperCard tutorial 
and the tutorial chapter in the Introduc¬ 
tion Manual , and work through a few of 
the advanced examples, you’ll be 
hooked. It reminded me of countless 
hours spent solving applied mathematics 
and engineering problems with only the 
assistance of a “primitive” electronic cal¬ 
culator. TK Solver Plus sure can make 
life easier. 


This latest 
implementation 
of TK Solver, for 
the Macintosh, 
is perhaps the best. 


TK Solver Plus is a numerical analysis 
“toolkit,” hence the “TK” in the name. It 
has built-in tools, such as TK’s built-in 
functions, direct and iterative solvers, 
editors, calculators, and so on. TK also 
provides an architecture for extensibility, 
or hang-on tools, with features such as 
library functions, models, and predefined 
solver packs. You get a versatile toolkit 
indeed, with all the capability necessary 
to define your own functions, models, 
and applications to solve a broad range 
of problems. 

TK Solver is not new. It has been 
around for IBM PC compatibles since 
early 1983, and was one of the first 
Macintosh software products in 1984. 


TK Solver Plus for the Macintosh is de¬ 
rived from the original Macintosh ver¬ 
sion (TKISolver from Software Arts) and 
the latest PC-compatible version by Uni¬ 
versal Technical Systems (UTS). 

TK Solver Plus expands on the origi¬ 
nal TKISolver’s features with high- 
resolution graphics, math coprocessor 
support, ASCII and Lotus interfaces, 
context-sensitive help, debugging tools, 
and user-defined numeric formats and 
procedures. The number of equations 
(more than 1,000) remains the same, but 
the new version provides 75 (was 40) 
built-in functions, more than 200 (was 0) 
library functions, more than 100 (was 
10) sample models, and 16 (up from 12) 
significant digits. New numerical analy¬ 
sis capabilities include differentiation 
and integration, differential equation 
solving, statistical analysis, optimization, 
complex numbers. Boolean logic, and in¬ 
teractive display tables. 

In addition to PC and Mac versions, 
TK Solver Plus also comes in versions 
for VAX/VMS and Unix-based (such as 
the IBM RT AIX, HP/UX, and Sun) 
computers. TK operates the same on all 
platforms, except for environment-spe¬ 
cific features like the Mac’s graphical 
interface, and the TK models are compat¬ 
ible among all the platforms. But this 
latest implementation of TK, for the 
Macintosh, is perhaps the best. The 
rule-based and object-oriented (no kid¬ 
ding) TK sheet architecture is a natural 
in the Mac’s multiwindow desktop envi¬ 
ronment. 

As UTS explains, TK combines a 
high-level, rule-based declarative lan¬ 
guage with a simple, conventional proce¬ 
dural language. The rule-based language 
processes the equations, and the proce¬ 
dural language allows you to expand 
TK’s built-in functions by creating your 

TK’s object-oriented sheet architecture 
is derived from the concept of work¬ 
sheets, or sheets of paper normally used 
to organize your work when solving 
problems with pencil and paper. TK uses 
eight sheets to “hold” the objects for its 
models. The eight sheets (or TK objects) 
are named Variable, Rule, Function, 

Unit, List, Plot, Table, and Numeric For¬ 
mat. Six of the sheets have subsheets — 
Variable, Rule, Function, List, Plot, and 
Table — that contain detailed informa¬ 
tion about each object. Whenever you 
modify or solve the model, TK automati¬ 
cally updates the sheets and subsheets. 

The Mac’s multi window interface lets 
you display, size, and position any num¬ 


ber of sheets and subsheets on the screen 
(including plots and tables). This allows 
you to simultaneously work with as 
many objects as necessary when solving 
a problem. The Macintosh version also 
offers increased calculation precision 
using 19-20 significant digits (instead 
of 16). 


TK Solver combines 
a high-level, 
rule-based 
declarative language 
with a simple, 
conventional 
procedural language. 


An easy to overlook subtlety that 
clearly exemplifies TK’s usefulness is 
referred to as “backsolving.” This is the 
ability to detect and automatically per¬ 
form inverse operations when solving 
equations. In other words, once you enter 
an equation on the Rule sheet, you can 
solve for the unknowns in any order, 
without rearranging. 

The two softbound manuals for TK — 
the introduction and a detailed reference 
— are nicely structured, comprehensive, 
and clearly written. UTS clearly supports 
TK, offering many useful accessories. 
TKISolverpacks are predefined models 
for specific applications, such as Finan¬ 
cial Analysis, Introductory Science, Me¬ 
chanical Engineering, Electrical Engi¬ 
neering, and more. They cost $70 each. 
TK/CAD Link ($195) lets you combine 
the drafting power of AutoCAD with 
TK’s equation-solving power. The 
widely used McGraw-Hill engineering 
reference book, Roark's Formulas for 
Stress & Strain by Warren C. Young, is 
available as a new application package 
called “Roark & Young on TK” ($595). 

You can sample TK Solver Plus with a 
reduced capability package called Mini- 
TK, available for the PC or Mac, for $20. 
If you crunch numbers, I highly recom¬ 
mend you get TK Solver Plus from 
Universal Technical Systems, 1220 Rock 
St., Rockford, IL 61101, phone (800) 
435-7887. 
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Advance Program 

Ninth Symposium on 
Reliable Distributed Systems 



Sponsored by In cooperation with 

IEEE Computer Society @ TC on Fault-Tolerant Computing 

TC on Distributed Processing 

October 9-11, 1990, Marriott Hotel, Huntsville, Alabama 


Tuesday. October 9. 1990 


8:00 am - 9:00 am Registration 

9:00 am - 5:00 pm Tutorial 

Wednesday, October 10. 1990 
7:30 am - 8:30 am Registration 

8:30 am - 9:00 am Opening Remarks: Raif M. Yanney, TRW Inc. 


9:00 am - 10:00 am 
10:00 am - 10:30 am 


Luca Simoncini, IEI-CNR 
Keynote Speech: TBD 
Break 

Distributed Operating Systems/Chair: TBD 

"The Design and Implementation of a Reliable Distributed Operating System" 
Tony R Ng, University of Illinois at Urbana-Champaign 
"Using Stashing to Increase Node Autonomy in Distributed File Systems" 
Rafael Alonso Daniel Barbara, and Luis L. Cova, Princeton University 
"Distributed Lock Management in a Transaction Processing Environment" 
Andrew B. Hastings, Carnegie-Mellon University 


Noon - 1:30 pm Lunch 


Session 2: Networks and Communications/Chair: Tom Lawrence, RAOC 

1:30 pm - 3:00 pm "An Improved Algorithm for the Symbolic Reliability Analysis of Networks" 

Malathi Veeraraghavan, AT&T Bell Labs, and Kishor Trivedi, Duke University 


3:00 pm - 3:30 pm 


Session 3: 

3:30 pm - 5:00 pm 


5:30 pm - 7:30 pm 
Thursday, October 11, 
7:30 am - 8:30 am 
Session 4: 

8:30 am - 10:00 am 


10:00 am - 10:30 am 
Session 5: 

10:30 am - Noon 


"Reliable Broadcast for Fault Tolerance on local Computer Networks" 

Paulo Verissimo and Jose Alves Marques, INESC, Portugal 
Break 

Distributed Data Bases/Chair: Sang Son, University of Virginia 
"A Low-Cost Atomic Commit Protocol" 

James W. Stamos and Flaviu Cristian, IBM Almaden Research Center 
"Adaptability Experiments in the RAID Distributed Data Base System" 

Bharat Bhargava, Karl Friesen, Abdelsalam Helai, and John Riedl, Purdue University 
"Fault-Tolerant Distributed Data Base Systems via Data Inference" 

Wesley Chu, Andy Hwang, Rei-Chi Lee, Qimeng Chen, and Matthew Merzbacher, UCLA 
Reception at Space Museum 


Registration 

Replication and Real Time/Chair: William Mahaffey, TRW Inc. 

"Representation and Execution Support for Reliable Robot Applications" 

Prabha S. Gopinath, Damian M. Lyons, and Sandeep Mehta, Philips Labs 
"Preventing State Divergence in Replicated Distributed Programs" 

Alan Tully and Santosh K. Shrivastava, University of Newcastle upon Tyne 
"Adjudicators for Diverse-Redundant Components" 

F. Di Giandomenico and L. Strigini, IEI-CNR, Italy 
Break 

Algorithms/Chair: Farokh Bastani, University of Houston 

"Voting as the Optimal Pessimistic Scheme for Managing Replicated Data” 

Piotr Berman and Mirjana Obradovic, Pennsylvania State University 
"A Comparison of Voting Strategies for Fault-Tolerant Distributed Systems" 

Douglas M. Blough, University of California, Irvine, and Gregory F. Sullivan, Johns Hopkins University 
"A Fault-Tolerant Algorithm for Distributed Mutual Exclusion" 

Yb-ln Chang, Mukesh Singhal, and Ming T. Liu, Ohio State University 
Lunch 

Architecture/Chair: David Powell, LAAS-CNRS, France 

"RelaX - An Extensible Architecture Supporting Reliable Distributed Applications" 

R. Kroger. M. Mock, R. Schumann, and F. Lange. GMD, West Germany 
"Temporal Uncertainties in Interactions among Real-Time Objects" 

Hermann Kopetz, Technische Universitat Wien, and K.H. (Kane) Kim, University of California, Irvine 
"A Methodology for the System State Characterization of Event Recognitions" 

M. Spezialetti, Lehigh University, and J.P Kearns, College of William and Mary 


Noon - 1:30 pm 
Session 6: 

1:30 pm - 3:00 pm 












Ninth Symposium on Reliable Distributed Systems 

Tutorial: Fault Tolerance in Real-Time Distributed Systems 

Presentor: Barry W. Johnson, Department of Electrical Engineering, Center for Semicustom Integrated Systems, University of Vir¬ 
ginia, Charlottesville, Virginia. 

Audience: Engineers and scientists interested in an introduction to the design and analysis of fault-tolerant systems for real-time, 
distributed applications. 

Abstract: Real-time distributed systems provide a significant challenge to designers because of demanding time constraints and 
performance requirements. The design problems become increasingly more difficult when fault tolerance must be included as an 
attribute of the system. This tutorial provides the participant with an introduction to important issues, fault tolerance techniques, 
fault avoidance procedures, and analysis methods specifically useful in the design of real-time distributed systems. In addition, exam¬ 
ples from practical applications of the technology are presented to solidify the theoretical concepts. 

The purpose of this tutorial is to introduce the participants to techniques for designing and analyzing fault-tolerant, real-time distributed systems. 
The tutorial begins with an introduction to the relevant terminology and the design problems typically encountered. Fault tolerance and fault avoid¬ 
ance techniques are reviewed, and the design process used to employ the techniques is explored. Both hardware and software issues are consid¬ 
ered. Evaluation metrics such as reliability, availability, safety, maintainability, and performability are introduced to provide measures of the quality 
of a design. Analytical modeling methods such as combinatorial, Markov, semi-Markov, and Petri nets are presented to show ways in which the 
various quantitative measures can be determined. Simulation-based approaches from performance modeling theory are also discussed as a means of 
evaluation. Finally, important issues and techniques associated with implementing fault-tolerant, real-time systems using Very Large Scale Integration 
(VLSI) and Wafer Scale Integration (WSI) technology are discussed. In all cases, the theoretical discussions are supplemented with small, practical 
examples that clearly illustrate the technology. 

The remainder of the tutorial deals specifically with a more complex and detailed example of the presented technology. A practical application is 
used to demonstrate the problems encountered by the designer, the design process pursued, the specific fault tolerance techniques applied, the 
analysis performed, and the results obtained. 


Biography: Barry W. Johnson is currently an associate professor in the Department of Electrical Engineering at the University of Virginia. 

He is also a cofounder and member of the Center for Semicustom Integrated Systems, a Technology Development Center of the Virginia 
Center for Innovative Technology. Prior to joining the University, he was with Harris Corporation in Melbourne, Florida, where he participated 
in the design and analysis of fault-tolerant computer systems for aerospace applications. His research and teaching interests include fault- 
tolerant computing, VLSI architectures, VLSI testing, and microprocessor-based systems. He is the author of a textbook entitled "The Design 
and Analysis of Fault-Tolerant Digital Systems," published by Addison-Wesley Publishing Company. In addition, he is the author or coauthor 
of two book chapters and more than 50 papers in his technical area of interests. Johnson is presently active in the IEEE Computer Society 
as Vice President for Membership and Information Activities, a member of the Executive Committee, an ex-officio member of the Board of 
Governors, and a member of the editorial board of the IEEE Transactions on Computers. He is also active in the IEEE as a member of the 
Technical Activities Board (TAB) and chair of the TAB Finance Committee. 

Johnson received BS, ME, and PhD degrees in electrical engineering from the University of Virginia, Charlottesville, Virginia, in 1979, 1980, 
and 1983, respectively. He is a member of the IEEE, the IEEE Computer Society, Tau Beta Pi, Eta Kappa Nu, and Sigma Xi. 


Special Workshop in Conjunction with 9SRDS 

IEEE Workshop on Experimental 
Distributed Systems 


Thursday, October 11, 1990 • 4:00 pm-6:00 pm 
Friday, October 12, 1990 • 8:00 am-4:00 pm 


Topics of Workshop 

• Data Base and File Systems 

• Object and Real-Time-Based Systems 

• Programming Environments and Applications 

• Operating System 

Workshop Program Chair: Bharat Bhargava 
Purdue University 


























NEW PRODUCTS 


Contact or send press releases to Nancy Hays, Computer, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1264; Compmail+, n.hays 


Toshiba laptop EWS employs RISC 


Toshiba claims to have developed the 
first RISC-based laptop engineering work¬ 
station (EWS). Available in Japan, the 
Sparc LT, AS1000/L10 runs on a RISC 
CPU based on Sun Microsystems’ Sparc 
architecture. The laptop results from a 
May 1989 agreement between Toshiba 
and Sun, under which Toshiba obtained 
rights to develop and manufacture com¬ 
puters based on the Sparc architecture and 
SunOS Unix operating system. 

The laptop weighs 17.7 pounds. It re- 


Hewlett-Packard has announced a 
new low-end system and a server in its 
HP 3000 Precision Architecture RISC 
family, geared for on-line transaction 
processing. The Series 920 and Server 
920 reportedly connect up to 48 PCs or 
terminals. They also support up to 20 
users simultaneously. 

According to the company, the Series 
920 targets users who primarily use 
terminals for OLTP applications. The 
Server 920 suits users who distribute 
OLTP applications across a network 
of PCs. 

The deskside systems come with an 
integrated disk drive, a digital-audio- 
tape cassette drive, 24 Mbytes of mem¬ 
ory expandable to 56 Mbytes, 48-bit vir¬ 
tual addressing, 64 Kbytes of CPU cache 
memory, and a 64-entry unified-transla- 
tion lookaside buffer. 

The Series 920 comes with HP Turbo 
Image and HP System Dictionary soft¬ 
ware preinstalled, with an option to add 
HP Allbase/SQL. Also preinstalled is the 
HP MPE/XL operating system for 20 si¬ 
multaneously logged-on users. 

The Server 920 software support in¬ 
cludes the Vplus/Windows user inter¬ 
face, NewWave System Services, and 
existing HP 3000 applications. The 
Server 920 supports PCs connected 
through a LAN, plus terminal and PC 
virtual-terminal access from other sys- 


portedly has a processing speed of 13.2 
million instructions per second. Memory 
includes 8 Mbytes of main memory using 
16 units of the company’s 4-Mbit 
DRAMs, a 180-Mbyte hard disk drive, a 
64-Kbyte cache memory, and a 3.5-inch 
floppy disk drive. 

In Japan, the AS1000/L10 costs 
¥1,980,000. The company has not deter¬ 
mined details on overseas marketing. 
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terns on a network. 

The Series 920 and Server 920 costs 
$28,000 each. 
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Hewlett-Packard’s 3000 Series and 
Server 920 employ the same Precision 
Architecture and packaging. 


HDS mainframes 
challenge competitors 

Hitachi Data Systems has announced 
three new models in its EX Series of 
mainframe computers. Models 310 and 
420 are scheduled to ship in the second 
quarter of 1991, while model 85 is 
scheduled for the fourth quarter of 1990. 
The company plans to announce product 
specifications, performance, availability, 
and detailed pricing later. 

EX models support System/370 oper¬ 
ating systems, high-speed optical chan¬ 
nels, remote operations capabilities, and 
four-level storage. According to HDS, 
the EX Series provides ESA/370 capable 
mainframes. 

The EX 310 features three-way pro¬ 
cessing. The EX 420 is a four-way multi¬ 
processor based on dyadic instruction 
processors. The EX 85 is a three-way 
multiprocessor. The EX 310 and 420 fea¬ 
ture densities of 12,000 gates per semi¬ 
conductor device compared to the 2,000 
and 5,000 gates on other EX models. 

The company claims to have improved 
switching speeds for these devices from 
200 ps to 70 ps. The switching speed of 
the I/O processor has reportedly gone 
from 1,000 ps to 300 ps and the gate den¬ 
sity, from 40,000 to 80,000 gates. 

Central and expanded storage for mod¬ 
els 310 and 420 will be implemented in 
4-Mbit DRAMs, according to the com¬ 
pany, up to a total of 2 Gbytes of central 
storage. Expanded storage ranges up to 4 
Gbytes for model 310 and 8 Gbytes for 
model 420. 

The EX Series supports up to 128 
channels, depending on the model. Chan¬ 
nels can operate at speeds of 3, 4.5, and 
6 Mbytes per second. The EX 85 and 310 
can be configured with up to 48 optical 
channels and the EX 420, up to 96 opti¬ 
cal channels. 

A typical EX 310 configuration of 256 
Mbytes of central storage and 64 chan¬ 
nels will cost about $11,800,000. An EX 
420 in a minimum configuration of 256 
Mbytes of central storage and 64 chan¬ 
nels will cost around $14,400,000. 
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Microsoft makes Windows 3.0 available worldwide 


Microsoft has announced worldwide 
availability of version 3.0 of its Win¬ 
dows graphical user interface for 
DOS-based PCs. According to the 
company, Windows 3.0 employs a pro¬ 
portionally spaced system font, 3D 
scroll bars and command buttons, and 
colored icons. The interface resembles 
that of Microsoft’s OS/2 Presentation 
Manager. 

The user shell that comes with Win¬ 
dows reportedly provides the tools and 
resources to manage applications and 
files without leaving Windows. The 
Program Manager, a component of the 
shell, presents Windows applications 
and systems functions as colored icons 
that users can rearrange. 

Another shell component, the File 
Manager, allows users to locate and 
manipulate their files, including mov¬ 
ing files to and from any disk drive 


(including a server). It uses a directory- 
tree format. 

The third shell component, the Control 
Panel, allows users to design their own 
screens. 

A new memory management system 
reportedly speeds up Windows applica¬ 
tions and allows users to keep multiple 
large applications open and accessible. 
Windows applications can exploit up to 
16 Mbytes of memory. On 386-based 
systems, Windows can exploit the virtual 
protected-mode capabilities and provide 
up to 48 Mbytes of memory. This effec¬ 
tively eliminates the DOS 640-Kbyte 
memory limitation. 

In addition to enhanced desktop acces¬ 
sories, Windows 3.0 provides new desk¬ 
top applications. These include Recorder, 
a macro that records and plays back key¬ 
strokes and mouse movements; Solitaire, 
an electronic version of the card game; 


Powerpoint 2.0 uses Windows 3.0 for 
PC presentation graphics 


Microsoft designed Powerpoint ver¬ 
sion 2.0 for the Windows 3.0 graphical 
environment. Powerpoint 2.0 provides 
presentation graphics tools to PC users 
who want to create and organize presen¬ 
tations. 

According to the company, Power- 
point’s graphical user interface lets us¬ 
ers see all the elements of their visuals 

— text, graphics, and charts with the 
chosen colors, fonts, and backgrounds 

— as they will appear on the finished 
slide or overhead. Users can manipulate 
objects on the screen, mix different ele¬ 
ments, and exchange data with other 
Windows applications. 

Powerpoint saves an entire presenta¬ 
tion in a single file. This permits easy 
rearrangement of visuals and automatic 
numbering of pages. Users can also 
copy visuals from one presentation to 
another and output presentations as 
overheads, 35-mm slides, or images 
shown on a computer monitor. 

Tools include a Slide Master, which 
assists users in creating a consistent 
look for visuals in a presentation; a 
Slide Sorter, which simultaneously dis¬ 
plays miniatures of all visuals in a pre¬ 
sentation; and a Title Sorter, which 
simultaneously displays the titles of all 
the visuals in a presentation. 

Speaker support materials provided 
by Powerpoint include notes pages, 
which contain a miniature copy of each 
visual at the top of each page, and 


audience handouts, which show mini¬ 
ature copies of two, three, or six visuals 
per page. 

Powerpoint comes with more than 
5,000 predesigned color schemes devel¬ 
oped by Genigraphics, plus more than 
400 of that company’s color clip-art im¬ 
ages. Shaded backgrounds are available 
in 44 styles. Windows 3.0 provides 16.7 
million colors. 

Powerpoint also includes built-in 
charting capabilities and a conversion 
utility that uses the Apple File Exchange 
to move documents between Macin¬ 
toshes and PCs. Other features are a 
built-in word processor, drawing fea¬ 
tures, graphics importation, outline im¬ 
portation, Bitstream Fontware, support 
for a variety of output devices, and the 
ability to turn presentations into an elec¬ 
tronic slide show. 

Powerpoint 2.0 requires an IBM PC or 
compatible with a 286 or 386 (recom¬ 
mended) microprocessor, Windows 3.0, 
a Windows-supported video adapter, a 
Microsoft or compatible mouse, a hard 
disk, and a 5.25-inch or 3.5-inch 1.44- 
Mbyte or 720-Kbyte floppy disk drive. A 
256-color video adapter and Windows 
3.0 driver are optional. 

Powerpoint 2.0 retails for $495. 

French and German versions are sched¬ 
uled for August 1990, with Dutch, Ital¬ 
ian, and Swedish versions planned. 
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and a full-color painting program. 

The company claims that version 3.0 
has built-in “network awareness” for 
easy installment and use in network con¬ 
figurations. Users can access hypertext 
help from setup, applications provided 
with Windows, and the user shell. 

Windows 3.0 retails for $149. Users of 
earlier versions can upgrade for $50 by 
calling (800) 323-3577. 

A minimum configuration is a 286- 
based PC with 640 Kbytes of RAM, one 
floppy disk drive, and a hard disk drive. 
With another 256 Kbytes of extended 
memory, version 3.0 will use the pro¬ 
tected mode. On a 386 machine, version 
3.0 requires 1,024 Kbytes of extended 
memory to use the enhanced mode. Win¬ 
dows 3.0 also requires DOS 3.0 or 
higher. 
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HP enhances NewWave 
for Windows 3.0 

Hewlett-Packard has enhanced its 
NewWave software in version 3.0 to run 
on Microsoft Windows 3.0. NewWave 
3.0 reportedly offers full agent capability 
and the ability to share objects on a 
network. 

NewWave 3.0 costs $195, with ship¬ 
ments scheduled for August 1990. Up¬ 
grades cost $50. 
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Pagemaker enhanced for 
Windows 3.0 

Aldus has enhanced its Pagemaker 
software in version 3.01 to take advan¬ 
tage of Microsoft Windows 3.0. Accord¬ 
ing to the company, version 3.01 runs 
faster than Pagemaker under Windows 
2.0 and bypasses the DOS 640-Kbyte 
barrier. 

Registered owners of Pagemaker 3.0 
for the PC in the US and Canada can up¬ 
grade for $25 by calling Aldus at (206) 
628-2320. Subscribers to Aldus Cus¬ 
tomer First and Extended Technical Sup¬ 
port Service programs, plus those who 
purchase Pagemaker after May 22, can 
upgrade for free. 
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Unisys supplies 
LAN servers 

Unisys claims that its PW 2 800/486- 
25A is the most powerful PC the com¬ 
pany produces. The system incorporates 
Intel’s 25-MHz 80486 microprocessor, a 
floating-point processor, 8 Kbytes of 
cache memory, and support for up to 32 
Mbytes of RAM and 640 Mbytes of 
SCSI disk storage. 

The company has positioned the 800/ 
486-25A as a LAN server, but the PC 
can also function as a multiuser system 
running SCO Unix or Xenix. It supports 
MS-DOS, Windows/386, and OS/2. 

Networking topologies supported in¬ 
clude Ethernet, twisted pair, and Token 
Ring. Networking programs supported 
include Novell Netware, Unix UUCP, 
and LAN Manager. 

A basic configuration comes with two 
asynchronous serial ports, one parallel 
port, mouse with PS/2 connector, clock/ 
calendar, battery backup, three half¬ 
height peripheral device bays, and one 
full-height peripheral device bay for a 
price of $10,795. 

A system with a 5.25-inch 140-Mbyte 
disk drive, 3.5-inch 1.44-Mbyte disk 
drive, SCSI controller, VGA controller, 
mouse, keyboard, 14-inch VGA color 
graphics monitor, and MS-DOS 4.01 
with Windows/386 costs $14,174. A sys¬ 
tem with a 14-inch VGA monochrome 
monitor costs $13,748. 

Unisys has added two 80386-based 
systems to its PW 2 family as well. The 
800/33A server employs a 33-MHz 
80386DX CPU with an 80387 math 
coprocessor as an option, 24 Mbytes of 
memory capacity, and support for up to 
640 Mbytes of SCSI disk storage. It 
functions in the Novell Netware environ¬ 
ment, as a multiuser system running SCO 
Unix or Xenix, or as a communications 
server for LAN workstations. The 500/ 
20A workstation uses a 20-MHz 
80386DX processor with an 80387 math 
coprocessor as an option, supports 160 
Mbytes of SCSI disk storage, and comes 
with five expansion slots. 

The 800/33A basic system includes 4 
Mbytes of memory, a 64-Kbyte memory 
cache, one parallel and two serial ports, 
eight expansion slots, five half-height 
peripheral spaces, and a coprocessor 
socket for $6,800. 

The 500/20A basic system includes 2 
Mbytes of memory, a disk controller, one 
parallel and two serial ports, five expan¬ 
sion slots, two half-height peripheral 
spaces, one 3.5-inch peripheral space, 
and a coprocessor socket for $3,795. 
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Unisys workstations target Unix market 


Unisys has announced a family of 
Unix workstations called the U 6000/ 
WS. The systems run Unix and MS-DOS 
in a multitasking environment. They em¬ 
ploy the company’s Primary Graphical 
Environment, based on the X Window 
System, OSF/Motif graphical user inter¬ 
face, and X.Desktop utilities. 

All systems come with an Ethernet 
controller. They support TCP/IP proto¬ 
cols and Network File System. 

The family consists of two series. The 
UN6100 systems incorporate Intel’s 33- 
MHz 80386 microprocessor, 8 Mbytes of 
memory expandable to 16 Mbytes, and 
an 80387 floating-point processor. The 
UN6200 series incorporates Intel’s 25- 
MHz 80486 processor and memory ex¬ 
pandable from 8 Mbytes to 32 Mbytes. 

Prices for the new systems start at 
$8,995. 

The U 6000 family now includes the 
entry-level U 6000/10 for two to six us¬ 


ers at prices ranging from $8,000 to 
$20,000. The system comes with an 
80386 CPU, 4-16 Mbytes of memory, 
and an 80- or 160-Mbyte hard disk drive. 

Another new U 6000 member, the U 
6000/60 midrange system, supports up to 
80 users. It comes with an 80486 CPU, 
up to 80 Mbytes of parity memory, and 
up to 2.6 Gbytes of disk storage. The 
company’s first multiuser 486-based sys¬ 
tem, the U 6000/60 costs $35,000 to 
$90,000. 

Like the U 6000/WS family, the U 
6000/10 and U 6000/60 support Locus 
Computing’s Merge 386 multitasking, 
multiuser operating environment. Merge 
386 allows users to run multiple Unix 
and MS-DOS applications under a single 
interface. 
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Digital, Apple software integrates VAX, Mac 


Digital Equipment and Apple Computer 
have announced software products result¬ 
ing from their 1988 joint development 
agreement. Lanworks and SQL/Services 
permit Macintosh computers and Apple- 
talk networks to work with VAX systems 
and DECnet/OSI networks. The software 
was reportedly designed as an integral 
part of Digital’s Network Application 
Support. 

Lanworks includes VMS server soft¬ 
ware, Mac client applications, network 
software to integrate Appletalk networks 
with DECnet/OSI networks, and devel¬ 
oper tools. NAS services include file shar¬ 
ing, common print services, document 
interchange, electronic mail, common ap¬ 
plication access, and database access. 

Lanworks requires a Macintosh Plus, 
Mac SE or SE/30, Mac II, IIx, Ilex, or Ilci 
and 2 Mbytes or more of RAM (recom- 


NCR extends System 10000, 

NCR has announced the System 10000 
Model 85, which joins the System 10000 
family of multiuser computers at the high 
end. Model 85 consists of multiple dyadic 
processors connected by NCR’s SCSI In¬ 
ter-Processor Bus. An operator with an 
ITX Windows PC Console controls sys¬ 
tem resources for single-point operation 
of the multiprocessor Model 85. Prices 
range from $485,000 to $600,000. 


mended). Software and documentation is 
$440. Licenses cost $295 per Macintosh. 
User shipments are scheduled for Sep¬ 
tember 1990. 

SQL/Services is part of VAX Rdb/ 
VMS, version 4.0. It provides remote ac¬ 
cess to VAX Rdb/VMS databases from 
applications running in Macintosh, OS/2, 
VMS, Ultrix, and MS-DOS environ¬ 
ments. According to the company, the 
Rdb database server resides on the VAX 
system with the client portion in the end- 
user application on the desktop system. 

VAX Rdb/VMS V3.1 with SQL/Ser¬ 
vices ranges in price from $3,348 to 
$215,138. V4.0 with SQL/Services sup¬ 
porting Macintosh and OS/2 will be 
available in September 1990. 

Lanworks: Reader Service 44 
SQL/Services: Reader Service 45 


9800 families 

The Application Processor 4 joins the 
NCR 9800 family of large processors. It 
ranges in price from $650,000 for a base 
system to about $2 million for a high-end 
system. NCR claims that AP4 is the most 
powerful system it offers in terms of 
batch and transaction processing. 
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Compaq PCs suit networked environments 


Compaq Computer claims that its 
Deskpro 386N and Deskpro 286N suit 
both networked and stand-alone PC ap¬ 
plications. Both support network operat¬ 
ing systems such as Novell Netware and 
Microsoft LAN Manager and network in¬ 
terface cards such as Ethernet, Token 
Ring, and Arcnet. 

The Deskpro 386N includes a 16-MHz 
Intel 80386SX microprocessor, while the 
Deskpro 286N uses a 12-MHz 80286 
CPU. Both come standard with 1 Mbyte 
of memory and are available in three 


Cray Research offers the Cray Y- 
MP2E as a cheaper successor to the Y- 
MP2 introduced in 1989. According to 
the company, the Y-MP2E delivers Cray 
Y-MP-class performance at prices start¬ 
ing at $2.2 million. The system’s com¬ 
patibility with the Y-MP gives users ac¬ 
cess to that computer’s application 
codes, plus the ability to upgrade. 

Cray Y-MP2E systems employ stan¬ 
dard Cray Y-MP memory and processor 


Sun Microsystems has announced an 
entry-level addition to its Sparcstation 
line of RISC workstations with its 
Sparcstation SLC, plus a midrange addi¬ 
tion to its Sparcserver line of RISC serv¬ 
ers with its Sparcserver 470. 

The Sparcstation SLC has no base 
unit, instead packing the CPU compo¬ 
nents within a 17-inch monochrome dis¬ 
play. The CPU board incorporates a 20- 
MHz Sparc microprocessor, floating¬ 
point unit, 8-16 Mbytes of memory (us¬ 
ing 4-Mbyte SIMMs), one Ethernet port, 
monochrome frame buffer, audio, and an 
SCSI port. 

The board and 80W power supply are 
mounted behind the monitor. According 
to the company, the single power supply 
and the use of CMOS technology mean 
the unit does not need a cooling fan. 

The Sparcstation SLC reputedly deliv¬ 
ers 12.5 MIPS and 1.2 Mflops of perfor¬ 
mance. It supports from 104 Mbytes to 
2.7 Gbytes of external disk storage. 

The Sparcstation SLC with 8 Mbytes 
of memory, an internal audio speaker, a 
SunOS license, a keyboard, and a mouse 
lists for $4,995. 

The Sparcserver 470 configured with 
SCSI and/or intelligent peripheral inter¬ 


configurations. Model 40 has a 40-Mbyte 
hard disk drive and a 3.5-inch 1.44- 
Mbyte floppy disk drive; Model 1 in¬ 
cludes a 3.5-inch 1.44-Mbyte floppy disk 
drive; and Model 0 has no drives. 

Suggested resale prices of the Deskpro 
386N Models 40, 1, and 0 are $3,199, 
$2,399, and $2,299, respectively. Sug¬ 
gested resale prices of the Deskpro 286N 
Model 40, 1, and 0 are $2,599, $1,799, 
and $1,699, respectively. 

A 16-bit Compaq Video Graphics Sys¬ 
tem comes standard on both systems. It 


modules, plus the 6-ns clock cycle. They 
come in six configurations: one or two 
processors with 16, 32, or 64 million 
words of memory. They run the Unicos 
operating system, release 6.1 or later, 
based on AT&T Unix System V. They 
also support the company’s standard 
compilers, networking software, and user 
interface software. 

Reader Service SO 


ace disk subsystems functions as a com¬ 
putational engine, according to the com¬ 
pany. With IPI subsystems, the server 
handles file and commercial database 
applications. 

The server uses a 33-MHz Sparc 
microprocessor and reputedly delivers 22 
MIPS and 3.8 Mflops of performance. It 
features I/O caches and a 64-bit, 120- 
Mbps memory bus. 

The workstation version of the server, 
called the Sparcstation 470, employs 
Sun's GX 2D and 3D graphics accelera¬ 
tor and a translation look-aside buffer. 

The Sparcserver 470 configured as a 
computation server with 32 Mbytes of 
memory, a 669-Mbyte SCSI drive, and a 
150-Mbyte tape drive lists for $59,900. 
Configured as a server with a 1 -Gbyte 
IPI disk drive and a 150-Mbyte tape 
drive, the Sparcserver 470 costs $74,900. 
The graphics workstation configuration 
with a 669-Mbyte SCSI disk drive, 19- 
inch color monitor, GX graphics accel¬ 
erator, keyboard, and mouse costs 
$69,900. 

SLC: Reader Service 51 
470: Reader Service 52 
Graphics: Reader Service 53 


supports VGA, EGA, and CGA graphics 
resolution and displays up to 256 colors 
simultaneously. It also provides 132-col- 
umn text support. Also standard are par¬ 
allel, serial, point device, and keyboard 
interfaces, plus hard- and floppy-disk 
controllers. 

The PCs measure 15 inches wide, 3.9 
inches high, and 14.9 inches deep. They 
incorporate dual-speed fans. 

386N: Reader Service 48 
286N: Reader Service 49 


Bull beefs up DPX/2 line 

Bull HN Information Systems has 
added four models to its DPX/2 300 fam¬ 
ily of midrange, symmetric, multiproces¬ 
sing computers. Model 360 employs 
Motorola’s 68040 microprocessor. 

Model 510 employs RISC technology. 
Model 110 is an entry-level system, 
while Model 220 joins it at the lower 
end. All four models operate under the 
Bull Open Software (BOS) environment, 
based on AT&T’s Unix System V. 

Model 360 supports up to four CPUs, 
16-576 Mbytes of main memory, 338 
Mbytes to 23 Gbytes of disk storage, and 
more than 300 simultaneous users. Com¬ 
munications processors support local- 
and wide-area networking. An entry- 
level Model 360 costs $36,000, while a 
typical configuration is $150,000. 

Model 510 results from a 1989 joint 
development agreement between Bull 
and Mips Computer Systems. Model 510 
incorporates the Mips R6000 RISC pro¬ 
cessor. It reportedly supports more than 
500 simultaneous users and 40 Gbytes of 
disk storage. An entry-level system costs 
$170,000, while a typical configuration 
would be about $300,000. Shipments are 
scheduled for the fourth quarter. 

Model 110 uses an Intel 80386SX 
CPU. This entry-level DPX/2 model runs 
applications based on MS-DOS and Unix 
and is source and binary code compatible 
with most Intel-based Xenix applica¬ 
tions, according to the company. Sched¬ 
uled for the third quarter, it will cost 
$5,200 for an entry-level model and 
$7,600 for a typical configuration. 

Model 220 incorporates a 25-MHz 
Motorola 68030 CPU. It supports 8-16 
Mbytes of memory, up to 3 Gbytes of 
disk storage, and up to 32 simultaneous 
users. An entry-level model costs 
$11,100, while a typical configuration is 
$25,000. 

Reader Service 54 


Cheaper Cray supercomputer follows Y-MP2 


Sun adds low-end, midrange Sparc systems 
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Company, Model, Function 

Comments R.S. No. 

Analogic 

ADC4300 

Sampling ADC 

A 16-bit, hybrid, 50-kHz sampling A/D converter with on-board reference. Contains a clock and 120 
scaling amp through which pin-selectable voltage ranges are provided. Output data in parallel 
and serial formats as complementary binary or complementary offset binary. Comes in a 32-pin 

DIP. Cost (100s): $182. 

Cirrus Logic 

CL-GD5320 

Graphics chip 

An enhanced VGA-compatible graphics chip that requires two 120-ns 256Kx4 DRAMs for VGA 121 
graphics capability in a PC. Includes an 8/16-bit CPU interface, independent video and DRAM 
clocks, internal multiple FIFO and page-mode access design, and four pages of memory for VGA 

Mode 13. Comes in a 100-pin quad flat pack. Now sampling. Cost: $30. 

Cyrix 

Fasmath EMC87 

Math processor 

A math processor compatible with Intel’s 80387. Implements a full extended double-precision 122 

IEEE-754-1985 architecture with parallel adder, multiplier, and exponent units. Uses one of two 
interface modes: 80387 or EMC. Comes in a 121-pin ceramic PGA. Cost: $994 for 33 MHz, $865 
for 25 MHz, $774 for 20 MHz. 

Fujitsu 

E10040VHM, 

E10160VHR 

Gate arrays 

ECL gate arrays with 40 Kbits of on-chip RAM (E10040VHM) and 160 Kbits of ROM 123 

(E10160VHR). Gate counts of 14,572 and 14,530, respectively. Speeds of 80 ps unloaded and 

250-350 ps loaded. Max fanout of 40. Come in 441-pin ceramic surface-mount PGAs with 
preattached heat sinks. Production in fourth quarter 1990. Cost (1,000s): less than $2,000. 

Hitachi America 
HM62832H 

SRAM 

A 256-Kbit SRAM organized 32-Kbitx8 with 25-ns access times. Jointly developed and manu- 124 

factured by VLSI Technology and Hitachi. Also available in a low-power version 
(HM62832HL). Comes in JEDEC standard 300-mil plastic DIPs and SOJs. Cost (1,000s): $18. 

National Semiconductor 
NS32xxx 

Imaging processors 

NS32CG160 is an integrated system processor running at 15 MHz. NS32FX16 (15 MHz) is a 32- 125 

bit imaging/signal processor with software-programmable digital signal processing. 

NS32GX320 (20 MHz) is a 32-bit integrated system processor, the high-performance counter¬ 
part of the FX16. It also has software-programmable DSP. Cost (100s): $39.90 for CG160; 

$33.80 for FX16; $147 for GX320. 

Precision Monolithics 
PM-0820 

ADC 

A 2.5-ms 8-bit A/D converter with on-chip track and hold and an internal clock. Requires a 5 V 126 

supply. Has a digital interface. A second source to National Semiconductor’s ADC-0820 and 

Analog Devices’ AD7820. Comes in 20-pin plastic and ceramic DIPs. Cost (100s): starts at $7.10 
(extended industrial temperature). 

RAD 

RJ-010 

Error corrector 

An error-correcting IC that provides forward error correction and interleaving functions. Can 127 

correct up to three random errors in a 23-bit frame constructed from 12 data bits and 11 appended 
parity bits, or up to 9-bit length. Pin-selectable random or burst correction mode. Implements the 

Golay (23,12) code. Functionally independent encoder and decoder. No prices given. 

Siemens 

SAB 82526 HSCX1 
Controller 

A high-level serial communications controller with transfer speeds of 4-6 Mbps. A single-chan- 128 
nel version of the two-channel HSCX. Has a 64-byte FIFO per channel and direction, a DMA 
interface, and collision detection and resolution. Comes in a 44-pin PLCC. Cost (10,000s): $7. 

Sony 

E3Gxx 

Gate arrays 

Three ECL gate arrays with a gate delay of 200 ps and speeds up to 2 GHz. Model E3G1K has 1,000 129 
gates and power dissipation (85% gate utilization) of 3.3W. Model E3G2K has 2,000 gates and 
power dissipation of 5W. Model E3G4K has 4,000 gates and power dissipation of 8.2W. Cost 
(1,000s): $80 for E3G1K, plus $40,000 NRE; $150 for E3G2K, plus $50,000 NRE; $300 for 

E3G4K, plus $57,000 NRE. 

Texas Instruments 
TACT82411 

AT chip 

An AT chip operating at speeds up to 20 MHz for IBM PC-AT compatibles. Integrates system 130 

logic and most peripheral functions, including interrupt controller, timer, and real-time clock. 

Features separate CPU and AT bus clocks; software configuration for wait states, command de¬ 
lays, and memory organization; and single-bank page mode. Comes in a 208-pin quad flat pack. 

No prices given. 

Weitek 

XL-8220 

Hyperscript processor 

A single chip for Postscript-based page printer controllers. Part of the XL family of RISC-based 131 
page printer processors. Has a 32-bit RISC architecture that operates up to 25 MHz. Functions on- 
chip include a code cache, DRAM controller, interlock queues, printer FIFO, and printer channel 
control logic. Production in fourth quarter 1990. Cost: $99 (16 MHz) or $149 (25 MHz). 
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Microsystem Announcements 


Company, Model, Function Comments R.S. No. 


Analogic 
MSP-6C30 
Array processor 

MM-96 

Multimedia board 


Burr-Brown 
ZPB3210 series 
DSP boards 


A floating-point VME array processor optimized for digital signal processing and imaging appli- 135 
cations. Uses single or dual Texas Instruments TMS320C30 processors. Can be configured with 
up to 1 Mbyte of SRAM and 32 Mbytes of DRAM. Comes on a 6U VME Eurocard. Cost: $5,000. 

A multimedia board based on Motorola’s 96002 Media Engine floating-point processors. Per- 136 
forms IEEE-format floating-point calculations at a peak rate of 100 Mflops. Has DSPnet and DT- 
Connect interfaces. Available in configurations with one or two 96002s and 1-16 Mbytes of mem¬ 
ory. Cost: $3,995 for one 96002 with 1 Mbyte; dual 96002 versions start at $5,995. 

Two 32-bit digital signal processing boards on the VMEbus. Based on the AT&T WE DSP32C 137 
(80 ns) processor. ZPB3211 uses a single DSP32C at 50 MHz with 64 Kbytes of SRAM and buff¬ 
ered serial I/O ports. ZPB3212 has two DSP32C processors, 64 Kbytes of SRAM each, and full 
buffered serial I/O ports. Both come in standard 6U VME format. Cost: $3,495 for ZPB3211; 

$5,495 for ZPB3212. 


Dakota Microsystems An uninterruptable power supply card that installs inside a PC. Provides DC power directly to the 138 

Powersave motherboard, saves conventional memory to the hard disk, and shuts down the system when AC 

UPS card power fails. Upon restoration of power, restores the computer’s state. Comes with NiCad batter¬ 

ies. Cost: $339.95. 


Everex Systems 
Step 486/33 
PC 


A PC based on Intel’s i486 CPU. Features a dual-level caching system (8 Kbytes internal and 128 139 

Kbytes external cache) and supports the 486’s burst mode. Includes Programmable Configura¬ 
tion Select. Supports up to 64 Mbytes of 32-bit memory. Cost: $10,499 (20 MHz) or $8,999 (25 
MHz). 


GammaLink 
Gammafax CP 
Fax board 


A fax board supporting 16 lines. An extension of the original 8-line CP fax board. Allows text and 140 
graphics mixed on one page. Provides polling and turnaround polling, fax sanitization, and auto¬ 
matic fill. Users can put up to 16 CPs in a single computer, with 9,600 bps send and receive. Cost: 
$1,095. 


Mips Computer Systems First in a series of RISC workstations. Based on the 25-MHz Mips R3000 CPU. Uses the 141 

Magnum 3000 RISCwindows graphics environment, a Mips’ version of the OSF/Motif graphical user interface, 

Workstation and X Windows. Available in various configurations, from boards up. Cost: starts at $8,990 for a 

diskless version with 8 Mbytes of RAM. 


NCR 

PC486/MC33 

PC 


A 33-MHz i486 Micro Channel-based PC with NCR’s Super Video Graphics Array and SCSI 142 
technology. Incorporates an NCR-designed Micro Channel chip set, including the processor 
interface controller, dual port memory controller, memory error detection and correction, DMA, 

I/O controller, and MCA bus controller. Cost: $14,195 (basic: 4-Mbyte RAM, 1.44-Mbyte 3.5- 
inch floppy disk drive). 


PEP Modular Computers A VMEbus CPU board with a 32-bit Motorola 68030 microprocessor, 1-3 Mbytes of dual-ported 143 
VMPM68KD SRAM with backup, up to 512 Kbytes of ROM, and an optional 68881 floating-point coproces- 

CPU board sor. Comes on a single-height Eurocard. Cost (OEM quantities): $2,925 (16 MHz) or $4,050 (25 

MHz). 


Spectrum 
ADSP-2101 
DSP system board 


A PC-based system board built around Analog Devices’ ADSP-2101 signal processor. Provides 144 
8 Kwords of zero wait state memory, a dual-channel acquisition interface, and connectors for se¬ 
rial and parallel communications. Features an instruction cycle time of 80 ns. Also has 32 Kbytes 
of EPROM. Cost: starts at $1,995. 


Yarc 

Nusuper/30 

Coprocessor 


A Macintosh II RISC coprocessor system. Includes an Am29000 32-bit RISC CPU with separate 145 
date, instruction, and address buses; a 60-MHz system clock; up to 10 Mbytes of on-board mem¬ 
ory and direct access to 32 Mbytes of DRAM on the NuBus; and multibank, burst, and interleav¬ 
ing memory. Six configurations. Cost: from $3,995 to $7,395. 


Zeos International 
Zeos 486 
PC 


A PC based on Intel’s 25-MHz i486 CPU. Features an EISA bus design. Comes with or without a 146 
secondary 128-Kbyte cache. Complete systems include memory, disk drives, and video. Cost: 
starts at $4,995. 
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Editor: Edmund L. Gallizzi, Computer Science Dept., Eckerd College, St. Petersburg, FL 33733, phone (813) 864-8272, Compmailt e.gallizzi 


Future directions focus of Transaction Machine Architecture Workshop 

Martin Freeman, Philips Research, Sunnyvale 
Hector Garcia-Molina, Princeton University 
Randy Katz, University of California at Berkeley 


Mapping future directions in transac¬ 
tion processing — both in terms of hard¬ 
ware and software — was set down as the 
objective of the first International Work¬ 
shop on Transaction Machine Architec¬ 
ture (TMA-I) when the meeting con¬ 
vened September 25-28, 1988. General 
Chair Martin Freeman and Program 
Chair Dieter Gawlick provided the out¬ 
line for the Lake Arrowhead, California, 
workshop sponsored by the IEEE Com¬ 
puter Society Technical Committees on 
Microprocessors, Data Engineering, and 
VLSI, in cooperation with ACM 
SIGMOD. 

Transaction processing systems 
(TPSs) constitute a significant fraction of 
the data processing market. They are in 
widespread use in supporting airline res¬ 
ervations, tracking banking operations, 
and managing inventories; and they make 
extensive use of large numbers of disks 
and I/O channels. However, as hardware 
and its capabilities improve, some of the 
basic premises of transaction processing 
must be reconsidered. 

It may no longer be best to have data re¬ 
side on conventional disks. Instead, data 
can be resident in nonvolatile main mem¬ 
ory, or it can be fragmented over many 
very small disks to improve access time. 
Faster processors make it possible to in¬ 
clude new functionality in a TPS; for ex¬ 
ample, we might want the best fare to go 
from city A to B on any one of several 
days, on any airline, and making any 
small number of intermediate stops. 

Large global networks make it feasible to 
interconnect and integrate different ser¬ 
vices. 

Current TPSs have shown that it is pos¬ 
sible to reliably process large numbers of 
transactions — even more than 1,000 per 
second. But users are demanding more 
services. An airline reservation system 
not only must make reservations for 
flights, but must reserve hotel rooms, line 
up rental cars, give travel advice, com¬ 
pute the cheapest fares, find lost luggage, 
and so on. 

Looking into the future, it is not hard to 


The First International Workshop 
on Transaction Machine Architec¬ 
tures was meant to bring together re¬ 
searchers and practitioners to dis¬ 
cuss the future directions of transac¬ 
tion processing and produce a mon¬ 
ograph based on submitted work¬ 
shop articles and conference dis¬ 
cussions. Some 20 months after the 
fact, no published workshop record 
is in sight. Nonetheless, the editor- 
in-chief of Computer felt the follow¬ 
ing abridged workshop record, 
which is still timely, would garner 
considerable general interest 
among our readers. — Ed. 


imagine a world with no cash and no pa¬ 
per records. A soda is purchased from a 
machine with a debit card, not with quar¬ 
ters. Papers are submitted electronically 
to journals; newspapers are delivered 
electronically. So future TPSs will have 
to process many more and more varied 
types of transactions. 

Current TPSs process relatively 
simple user requests called transactions 
(for example, “deposit money in my bank 
account”). Transactions can update an 
on-line database and the changes are re¬ 
flected immediately. These TPSs are 
homogeneous, that is, they are controlled 
by a single organization and run on a 
single brand of processors. Such systems 
guarantee three properties: 

(1) transactions are atomic (that is, 
they are not left half done); 

(2) schedules are serializable (that is, 
concurrently executing transactions do 
not interfere with each other); and 

(3) database updates are persistent 
(that is, an update is not forgotten the next 
day). TPSs are accessible by networks of 
terminals, with transactions originating 
at these terminals. 

However, as the notion of TPS is ex¬ 
tended to cover more applications and 


more diverse hardware, the definition of 
TPS becomes fuzzy. Some applications 
require a wider variety of transactions, 
for example, heavyweight or long-lived 
transactions. Future TPSs will be hetero¬ 
geneous collections of computers and da¬ 
tabase systems. Thus, the distinction be¬ 
tween TPSs and, say, database manage¬ 
ment systems, file systems, operating 
systems, information retrieval systems, 
or simply computer systems, becomes 
more blurred than it already is. 

Working groups 

As an integral part of the TMA-I work¬ 
shop, participants were divided into 
working groups and given a charter to de¬ 
termine the future of several aspects of 
transaction processing. The following 
are brief summaries of some of the 
groups’ findings and the discussions that 
followed. 

Secondary storage technology. This 
group was chartered to study current and 
future trends in secondary storage tech¬ 
nology for TPSs. It discussed a range of 
ideas, including solid-state disks, disk ar¬ 
rays, disk farms, and replicated disks. A 
disk farm or array replaces a large disk 
with a large number of smaller disks. The 
smaller disks have the potential to offer 
lower storage costs and higher band- 
widths (by having more heads that read in 
parallel). 

The difference between an array and a 
farm is that, in an array, a single file can be 
spread over multiple disks (disk striping) 
and, in a farm, most files reside on a single 
disk. Replication can be used in farms or 
arrays to improve reliability and availa¬ 
bility. Mirrored disks are a pair of identi¬ 
cal full disks, but there are many other op¬ 
tions such as replicating parts of a disk 
only, making more than two copies, and 
storing parity bits for striped disks. 

The concept of solid-state disks was 
also discussed. This is an electronic stor¬ 
age unit that has a disk interface. A solid- 
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state disk can also contain a conventional 
magnetic disk. If a long-lasting power 
failure occurs, the data is migrated to the 
disk (using batteries) before the unit is 
powered off. The electronic storage may 
have less capacity than the disk, in which 
case we have a so-called disk cache. Both 
disk caches and solid-state disks appear 
to the central process as a conventional 
disk. 

The new storage architectures raise 
many challenging research questions. 
These include how to partition the data 
over multiple disks, how to decide what 
to replicate and how many copies to 
make, the performance impact of solid- 
state disks, whether disk caches make 
more sense than conventional caches (in 
main memory), and the best types of user 
interfaces to these new storage subsys¬ 
tems. 

The group predicted that, for the 
1990s, a storage system would usually 
consist of a hierarchy of technologies. 
Solid-state disks would be at the top, fol¬ 
lowed by various disk arrays or farms, 
followed by various automatic archival- 
tape or optical devices, and finally in¬ 
cluding manually loaded archival de¬ 
vices. 

Transaction system architecture. 

This group was asked whether transac¬ 
tions will be used in all types of systems, 
whether transaction support is a job for 
the operating system or the hardware, and 
to discuss future requirements and open 
problems. 

When the term “TP monitor” was intro¬ 
duced, there was heated discussion. The 
term meant different things to different 
people. One group defined TP monitor as 
a telecommunications monitor — the 
component of TPSs that acts as a concen¬ 
trator process for terminals. It success¬ 
fully reduces the number of open connec¬ 
tions TPSs must manage, and it was 
claimed that all successful TPSs have this 
version of TP monitor. There was no 
agreement as to whether future systems 
would have to have TP monitors. 

Some participants argued that future 
computers could handle large numbers of 
connections and that, in the future, dumb 
terminals would be replaced by personal 
computers that are easier to manage. It 
was also argued that TP monitoring is a 
job for the operating system or the com¬ 
puter communication network, not for 
TPSs. On the other hand, other partici¬ 
pants insisted that TP monitors are suc¬ 
cessful as is, and that they would continue 
to provide important future TPS services. 

The second definition of TP Monitor 
was transaction processing monitor, that 
is, the component of TPSs that manages 
transactions. This TP monitor handles 
two-phase commit, logging, and other 


such functions. It provides the glue be¬ 
tween the database system, the operating 
system, the communications system, the 
naming service, and the authentication 
service. The application program should 
not be the one to tie these services to¬ 
gether; rather, that should be the job of the 
TP monitor. 

Some dissenters in the audience ques¬ 
tioned whether the current transaction 
concept was the right one for providing 
this glue. In particular, some applications 
do not require all the properties of trans¬ 
actions (atomicity, serializability, per¬ 
sistence), so they will pay a high price if 
they use TP monitors. It was suggested 
that maybe remote procedure calls were 
better at providing the necessary glue. 

The presenting group argued that 
transactions will be a key concept in all 
computer systems, but that extensions 
would have to be provided, such as nested 
transactions and sagas (long-lived trans¬ 
actions). It was thought that the TP moni¬ 
tor could evolve to a general system that 
supervises activities, recovers them, and 
manages their resources. This monitor 
would have to handle images, voice, and 
other types of data. It might also have to 
include real-time facilities because in 
some systems failing to meet a deadline 
can be considered equivalent to violating 
the transaction properties. So, the abort 
or commit of transactions must be coordi¬ 
nated with its real-time requirements. 

There are still a number of important, 
difficult problems to be solved in this 
area, including 

(1) distributed nested transactions 
with internal parallelism, 

(2) data replication, 

(3) programming interfaces, 

(4) load balancing, 

(5) system administration, and 

(6) security and authentication. 

Large distributed systems. This 
group concluded that very large systems 
would appear, having possibly 10 s nodes 
or more. In their view, transactions would 
play a key role in such systems. It was ob¬ 
served that linear algorithms were essen¬ 
tial for large systems; nonlinear ones 
would not scale up. Large systems would 
also need to be hierarchically organized 
to reduce their complexity. 

Heterogeneous systems will be the 
norm, and thus standards will be very im¬ 
portant. Such standards must address se¬ 
curity, authentication, naming, and com¬ 
mit issues. Applications standards, that 
is, ones that deal with a particular appli¬ 
cation such as banking, will also be re¬ 
quired. 

It was observed that standards some¬ 
times (some would say often) enforce 
antiquated procedures, that is, provide a 


least common denominator of existing 
systems. Private protocols used by com¬ 
panies may be better in that they are more 
modem and less complex. They can be 
used in local or homogeneous domains, 
with protocol converters used between 
the private domains and the global stan¬ 
dards. 

Security was singled out as one area 
where standards are sorely needed. Na¬ 
tional security policies will clearly affect 
TPS security standards, but it is not clear 
whether security standards or codes de¬ 
veloped in other domains will be directly 
applicable to transaction processing. 

Some additional remarks made by the 
group include 

(1) The larger a system becomes, the 
harder it becomes to change an interface; 
it seems we can never remove specifica¬ 
tions to an existing interface, we can only 
add. 

(2) IBM’s peer-to-peer communica¬ 
tion protocol LU6.2, which includes vari¬ 
ous forms of commit, appears to be an 
emerging de facto standard. 

(3) Bulletin boards that service hun¬ 
dreds of millions of users will appear, but 
new data models are required for such 
systems, as well as strategies for filtering 
and controlling postings. 

(4) The only way to manage large sys¬ 
tems is to break them up into collections 
of smaller, manageable systems. 

Processor technology. This group at 
TMA-I initially considered the fre¬ 
quently quoted assumption that “hard¬ 
ware will be free” in the near future. This 
conclusion was reached by observing 
that processor and storage costs are drop¬ 
ping rapidly while, at the same time, per¬ 
formance and storage capacity are im¬ 
proving dramatically. Thus, the hard¬ 
ware cost will be negligible compared to 
the software cost. 

These arguments are usually coun¬ 
tered by noting that the services required 
by applications are also increasing, so 
that hardware costs do not drop. In par¬ 
ticular, databases are growing in size, so 
the storage cost does not drop, even if the 
cost per bit does. 

One contingent took the position that 
the cost of TPS software will really drop. 
However, detractors pointed out that, 
while TPS software for systems based on 
personal computers appears to be cheap, 
the price for execution of that software 
may be even higher than that for pricey 
TPS packages. As more functionality is 
added to TPSs, old codes will have to be 
thrown out and new ones written — some¬ 
thing sure to cost. Thus, it was claimed 
that the cost of TPS software was likely to 
remain high, for example, hundreds of 
thousands of dollars per license. 


116 


COMPUTER 







There was a heated discussion when 
the topic of multiprocessors was intro¬ 
duced. It is clear that TPSs have, and will 
continue to have, multiple processors, 
but the key question is whether or not it 
makes sense to interconnect clusters of 
them through shared memory. One camp 
argued that future processors (more than 
100 MIPS) will saturate memory by 
themselves, so it is best to have private 
memories (the- so-called shared-nothing 
approach). Many applications prefer the 
single processor model and are easier to 
implement with private memory. More¬ 
over, cache consistency necessitates 
complex hardware and slows down pro¬ 
cessors. 

The advocates of shared memory 
multiprocessors contended that many 
applications are hard to partition among 
loosely coupled processors and fit more 
easily within the shared memory frame¬ 
work. Some tasks (such as large joins) 
may require more horsepower than what a 
single processor can offer. Such tasks re¬ 


quire fine-grained parallelism that works 
best with shared memory (otherwise 
communication costs would be too high). 

The group presented some suggestions 
for hardware support for transaction pro¬ 
cessing. The ideas included support for 
protection (for example, capabilities or 
tags), improved I/O bandwidth, nonvola¬ 
tile main memory, and fast context 
switching mechanisms. The suggestion 
for protection hardware to support capa¬ 
bilities was not well received (“It was 
tried in software and did not work, so why 
do it in hardware?”). There seemed to be 
opposition to any type of special-purpose 
hardware; faster standard processors will 
be able to do the job. 

I/O processors (like the channels used 
by IBM) was the last topic covered. An 
argument in favor of I/O processors is 
that processing power should be close to 
the data, performing such basic functions 
as filtering. But it was pointed out that 
there is no real reason for functionally 
segregated processors. The I/O processor 


could well be a general-purpose proces¬ 
sor residing close to the data. Following 
this line of reasoning leads one to the 
shared-nothing architecture: each pro¬ 
cessing unit contains disks and a fast pro¬ 
cessor. The units are then connected via a 
fast network. In this view, there is no need 
for I/O processors. Even if their hardware 
is cheap, I/O processors need special 
software and complicate system design. 

Conclusion 

Next generation TPSs are looming on 
the horizon. They will have advanced 
multiprocessors, large memories, and 
high-density archival storage devices. 
They will provide access and services at 
thousands of heterogeneous sites. 

In addition to a conventional transac¬ 
tion interface, they will provide more 
flexible options for managing activities. 
They will operate continuously, with 
little human intervention. Building these 
systems is the challenge facing us. 


‘We want to write less code,’ asserts symposium keynoter 


Ware Myers, Contributing Editor 

“Clearly, one of the places we want to 
get is where we will write less code,” said 
John F. Kramer when he addressed the 
keynote audience at the Symposium on 
Environments and Tools for Ada in Re¬ 
dondo Beach, California, April 30. Kra¬ 
mer, program director of the STARS 
(Software Technology for Adaptable, 
Reliable Software) and Software Engi¬ 
neering Institute programs in DARPA’s 
Information Sciences and Technology 
Office, cited more software reuse as one 
of the paths toward the goal of writing 
less code. 

Later, John Favaro of the European 
Consultants Network distinguished be¬ 
tween “reuse in the large” and “reuse in 
the small.” The former “involves the use 
of large, self-contained packages such as 
spreadsheets, databases, even operating 
systems,” he said. “An application is 
built entirely around the packages to the 
extent that the resulting product is often 
even described in those terms (an ac¬ 
counting system built around Database 
X).” 

In contrast, reuse in the small is ori¬ 
ented toward small components that are 
not intended to stand on their own, he 
went on. “They are woven into the fabric 
of a new application.” 

Favaro made this distinction between 


the large and the small because he said his 
experience convinced him that the prob¬ 
lems are different. “Reuse in the small, 
where components must be adapted again 
and again to new situations, is still domi¬ 
nated by many thorny technical prob¬ 
lems,” he asserted. 

These problems made up one of the 
central threads of this symposium. Kra¬ 
mer referred to them in his keynote ad¬ 
dress; several of the formal papers dealt 
with them; and one of the three working 
groups, occupying the symposium’s af¬ 
ternoons, concentrated on them. 

Thorny problems. Tens of thousands 
of sequences of code that are already em¬ 
bedded in existing software might poten¬ 
tially be used again. But these sequences 
may be “spaghetti twisted” into their 
original programs; they may be riding on 
different operating systems and hard¬ 
ware; their interfaces to other modules 
may be nonstandard. 

“The common approach has been to fill 
libraries with large numbers of existing 
components with the assumption that 
these components will be useful for a 
wide variety of applications,” Larry La- 
tour of the University of Maine said. “Our 
premise is that components do not exist in 
a vacuum. Rather, they exist as part of in¬ 


tegrated generic software architectures, 
each of which can be instantiated within a 
particular context to form executable sys¬ 
tems.” 

Latour outlined a framework for think¬ 
ing about such components and a method¬ 
ology for implementing them. The frame¬ 
work “is based upon the 3Cs model of a 
software component: its concept , repre¬ 
sented by an abstract specification; its 
content, represented by a family of imple¬ 
mentations, or generic architecture; and 
the context in which it is used in an envi¬ 
ronment.” 

To explain Latour’s ideas, in Ada the 
fundamental program unit is called a 
package. It consists of a specification and 
a body. The specification tells what the 
package does. The body, which hides the 
implementation details, does the work. A 
software engineer interested in reuse 
would thus be able to ascertain from the 
specification what the package does. 
Moreover, Ada provides a type of pack¬ 
age called the generic program unit in 
which the body contains the template for 
an algorithm. To apply this algorithm, the 
user must instantiate, or create, a particu¬ 
lar instance of it. 

Latour is combining the 3Cs model 
with this generic idea, or genericity, and 
an analytical approach he calls “the sepa- 
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ration of concerns” to develop a method¬ 
ology for designing and implementing 
reusable components. 

Other “thorny” problems discussed in 
various papers and the working group on 
methods and tools for specification, de¬ 
sign, and reuse include 

• Need for higher-level frameworks, 
for example, standard interfaces between 
components, so that reusable compo¬ 
nents, whatever their source, can be de¬ 
signed to fit a known interface. 

• Need for methods to test and validate 
reusable components, initially as sepa¬ 
rate elements and ultimately as part of the 
system to which they are attached. 

• Need for ways to store and access, not 
just the machine code or even the source 
code, but also specifications, design in¬ 
formation, and other documentation per¬ 
taining to the reusable component. 

• The corresponding need for a com¬ 
mon integrated programming support 
environment that enables the user to ma¬ 
nipulate all this information. 

• Finally, the need for places, called 
repositories or libraries, to warehouse 
reusable components. 

In this connection, Kramer said that 
STARS is reducing the two repositories it 
has been operating, one at Boeing and one 
by SAIC (Science Applications Interna¬ 
tional Corporation) for IBM, to the one at 
SAIC. “What we really need to work to¬ 


ward is people putting components into a 
repository and trying to compose pro¬ 
grams out of them,” he said. “We are 
clearly in the experimental level of this 
technology. We don’t know what the 
right levels of abstraction are. We need a 
common users’ model of a repository and 
we need to develop common interfaces.” 

A case study of reuse. Participants in a 
reuse project at Intecs Sistemi, Pisa, It¬ 
aly, surprised Favaro. Instead of being 
unwilling to use components that they did 
not design and implement themselves, 
the engineers displayed great willingness 
to take components out of the repository. 
“The really big surprise, however, had an 
entirely different, more insidious na¬ 
ture,” Favaro reported. “They thought it 
would be easy.” 

It was not. “One contributing factor 
was the low level of training in modem 
software engineering concepts we found 
among the engineers,” Favaro said. 
“There was little, if any, prior knowledge 
of such essential concepts as abstract data 
types, genericity, and object-oriented 
design.” Favaro believes there is far too 
little appreciation in the software engi¬ 
neering community of the intrinsic tech¬ 
nical difficulty of reuse. 

Favaro also applied an economic 
model of reuse developed by the Soft¬ 
ware Productivity Consortium to the Ital¬ 
ian project. The model deals with such 
variables as the percentage of code con¬ 


Ada well suited for use by handicapped 


“The host of computer gadgets I use 
certainly makes my day-to-day living 
easier," software engineer Eugenie 
(Jolle) Mason, who is blind, said in an 
interview at the Symposium on Environ¬ 
ments and Tools for Ada. But, she went 
on, “I am interested in more than just 
having the handicapped use comput¬ 
ers. I am interested in having the handi¬ 
capped employed as professional pro¬ 
grammers." 

Mason makes two central points. 

First, Ada’s characteristics suit it for 
use by the handicapped. Second, be¬ 
cause Ada is new and there is a short¬ 
age of experienced programmers who 
know it well, there is a better opportu¬ 
nity for handicapped newcomers to 
break into the field. 

A disability, such as blindness, 
means that the amount of information 
that can be transmitted from the com¬ 
puter system to the handicapped per¬ 
son is less than can reach a person with 
full sight. Mason pointed out that Ada’s 
design partly offsets this difficulty be¬ 
cause 


• It is not abbreviated or condensed. 
Variable types are not abbreviated. 
There are no "char’s” or “int's.” Ada 
uses “character” and “integer." Real 
words can be pronounced by a voice 
synthesizer. 

• Punctuation retains much of its 
original meaning. It is not used for modi¬ 
fying. For example, Ada’s double dot is 
much like an ellipsis in English, Mason 
explained. Its double dash for com¬ 
ments is also standard usage in writing. 

• Ideas are organized by levels of ab¬ 
straction in an Ada program. These lev¬ 
els “break the program into steps that 
let the user move from a general over¬ 
view of the program to look at each sec¬ 
tion in detail,” Mason said. 

Published examples of Ada show “a 
preference toward pronounceable 
names and annotated code,” Mason 
noted. That means a voice synthesizer 
can make Ada programs understand¬ 
able by the visually handicapped. “Ada 
is not only a readable language, it is 
also a hearable one.” 


tributed by reusable components, the cost 
involved in integrating the component 
instead of simply developing it from 
scratch, the charge to use a component 
from a repository, and the cost of making 
a component reusable. 

Previous attempts to model the cost of 
reuse have run into difficulties, account¬ 
ing for the complexity of the components 
relative to each other, Favaro said. So, he 
classified components in terms of com¬ 
plexity on five dimensions. He eventu¬ 
ally reduced the output of this analysis to 
three 5x3 spreadsheets. 

The relative cost of integrating compo¬ 
nents ranged from 1.10 for the simplest 
type of component to 1.63 for the most 
complex. The relative cost of reusable 
component production varied from 1.20 
to 4.80. The payoff threshold values for 
the 15 categories ranged from 1.33 to 
12.97. That means that several of the less 
complex categories on the low end of 
Favaro’s scale would be amortized after 
approximately two uses. In the middle of 
his complexity scale, the median payoff 
value would be between three and four 
uses. The highest four categories would 
range from six to 13 uses. 

Considering the difficulties in making 
this analysis, Favaro cautioned that “it 
would be foolish to read these figures as 
gospel, but the high numbers are hard to 
rationalize away. Expectations of early 
payoff for development for reuse may 
have to be revised upward.” 


At present, there are relatively few 
handicapped programmers because 
the methods used by computer sys¬ 
tems make it difficult for people with a 
disability to program. The characteris¬ 
tics of the Ada language outlined 
above make it easier for the visually 
handicapped to comprehend an Ada 
program. 

“Knowing Ada will improve the odds 
on getting a job," Mason believes, “be¬ 
cause the handicapped programmer 
does not have to compete with others 
with the same prior experience but no 
handicap.” Few have a lot of experi¬ 
ence with Ada. Moreover, civilian 
agencies, such as NASA and the FAA, 
as well as many private organizations, 
are beginning to use the language. 

Therefore, Ada is providing a win¬ 
dow of opportunity for the handi¬ 
capped for the next few years. There 
will be many Ada jobs available, Ma¬ 
son said, but fewer experienced Ada 
programmers competing for them than 
is usually the case with older lan¬ 
guages. — Ware Myers 
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CALL FOR PAPERS 


(fra IEEE Micro seeks manuscripts for gen- 
eral-interest issues in 1991. Topics of 
particular interest range from artificial intelli¬ 
gence and biological computing to VHDL de¬ 
sign and workstations. Submit manuscript to 
Joe Hootman, EE Dept., Univ. of North Da¬ 
kota, PO Box 7165, Grand Forks, ND 58202, 
phone (701) 777-4331. 


1991 IEEE Computer Society VLSI 
Workshop: Feb. 1991, Orlando, Fla. 
Sponsor: IEEE Computer Society Technical 
Committee on VLSI. Submit paper to Len Ber¬ 
man, IBM T.J. Watson Research Center, PO 
Box 218, Yorktown Heights, NY 10598, phone 
(914) 945-1213, fax (914) 945-2141, e-mail 
berman@ ibm.com. 


Int’l J. of Applied Intelligence will begin 
quarterly publication in January 1991 and 
seeks manuscripts. Write to Karen S. Cullen, 
Kluwer Academic Publishers, 101 Philip Dr., 
Norwell, MA 02061 for author instructions. 

Iecon 91, 17th Conf. of the IEEE Industrial 
Electronics Society: Oct. 28-Nov. 1, 1991, 


Call for papers and referees for Computer 


Computer Generated Music has been selected as the 
theme for the July 1991 edition. The issue will be devoted to 
examining the driving forces in the field from a computational 
standpoint, assessing the limits of computer music in the gen¬ 
eral music field, and discussing future desirable directions. 
See the April 1990 issue of Computer ( p. 127) for complete in¬ 
formation. 

Abstracts are due by 
August 30, 1990, and 
four copies of the full 
manuscript and four audio 
cassettes are due by Oc¬ 
tober 30,1990. Notifica¬ 
tion of acceptance is set 
no later than December 

31,1990, and the final ver¬ 
sion of the manuscript is 
due no later than March 
30, 1991. 

Submissions should be 
sent to Denis Baggi, Isti- 
tuto Dalle Molle per Studi 
sull’ Intelligenza Artifi- 
ciale, Corso Elvezia 36, 

6900 Lugano, Switzer¬ 
land, phone 41 (91)56 15 
78, Europe electronic 
mail denis%idsia.uucp@ 
chx400.switch.ch, US e- 
mail baggi@ 
berkeley.edu. 


For submittal to Computer, manuscripts must not have 
been previously published or currently submitted for publi¬ 
cation elsewhere. Each manuscript should be no more than 
32 typewritten, double-spaced pages long, including all 
text, figures, and references. Each submittal should in¬ 
clude a cover page that contains the title of the article, the 
full name(s) and affiliation(s) of the author(s), complete 
postal and electronic address(es) of all the authors as well 
as their telephone and fax number(s), a 300-word abstract, 
and a list of keywords identifying the central issues of the 
manuscript’s contents. The final manuscript should be ap¬ 
proximately 8,000 words in length and contain no more than 
12 references. 

If you are willing to review articles for any of these special 
issues, please send a note listing your research interests to 
Bruce Shriver, editor-in-chief of Computer or to one of the 
guest editors listed for the particular issue. Shriver may be 
reached at the University of Southwestern Louisiana, PO 
Drawer 42730, Lafayette, LA 70504-2730, phone (318) 
231-5811, fax (318) 265-5472, e-mail b.shriver on Comp- 
mail+ or shriver@usl.edu on Internet. 


Eight copies of the full manuscript are due by September 

1.1990. Notification of decisions is set no later than December 

1.1990, and the final version of the manuscript is due no later 
than February 1, 1991. 

Submissions and questions should be directed to either of 
the guest editors, Yann-Hang Lee, Computer and Information 
Science, University of Florida, Gainesville, FL 32611, phone 
(904) 392-1536, e-mail yhlee@cis.ufl.edu; or C.M. Krishna, 
Dept, of Electrical and Computer Engineering, University of 

Massachusetts, Amherst MA 
01003, phone (413) 545- 
0766, e-mail krishna@ecs. 
umass.edu. 


Distributed Computing 
Systems has been selected 
as the theme for the August 
1991 issue. Prospective au¬ 
thors are invited to submit tu¬ 
torial, survey, descriptive, 
case-study, application ori¬ 
ented, or pedagogic manu¬ 
scripts. 

Topics of interest include, 
but are not limited to; 


Real-Time Systems 

will be the theme of the May 1991 edition. Tutorial, survey, 
case-study, or pedagogic manuscripts are sought. 

Subtopics of interest include, but are not limited to: 

• Real-time languages: Special-purpose languages. 

• Real-time operating systems: Scheduling and reconfigu¬ 
ration techniques, real-time operating kernels, task alloca¬ 
tion. 

• Real-time architectures: Avionics and process-control 
architectures, case studies, architecture optimization. 

• Real-time communication: Token-ring and multiaccess 
protocols for real-time systems, synchronization techniques. 

• Performance/reliability analysis: Modeling techniques and 
software packages. 


• Distributed operating 
systems (process synchroni¬ 
zation, deadlocks, schedul¬ 
ing, load sharing, shapshots, 
clock synchronization, real¬ 
time systems, etc.). 

• Tools and languages. 

• Fault tolerance, crash re¬ 
covery, and reliability. 

• Performance measure¬ 
ments and modeling. 

• Experimental systems and case studies. 

Abstracts are due by November 15,1990, and the deadline 
for the full manuscripts is January 1,1991. Notification of deci¬ 
sions is set no later than March 15,1991, and the final version 
of the manuscript is due no later than May 1,1991. 

Submittals and questions should be directed to either of the 
guest editors, Mukesh Singhal, Dept, of Computer and Infor¬ 
mation Science, Ohio State University, Columbus, OH 43210, 
phone (614) 292-5839, e-mail singhal@cis.ohio-state.edu; or 
Thomas L. Casavant, Dept, of Electrical and Computer Engi¬ 
neering, University of Iowa, Iowa City, IA 52242, phone (319) 
335-5953, e-mail tomc@eng.uiowa.edu. 
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Kobe, Japan. Submit paper to Hiro Haneda, 
Electronics Engineering Dept., Kobe Univ., 
Rokko-dai, Nada-ku, Kobe City, Hyogo 857, 
Japan, phone 81 (78) 881-1212, fax 81 (78) 
861-7879. 


Carl K. Chang, EECS Dept., Univ. of Illinois, 
Box 4348, Chicago, IL 60680, phone (312) 
996-4860, fax (312) 413-1386, Compmail+ 
c.chang, e-mail ckchang@uicbert.eecs.uic. 


dam. Cosponsor: Institution of Electrical En¬ 
gineers. Submit paper by Sept. 3, 1990, to Sec¬ 
retariat, EDAC 91, CEP Consultants, 26-28 Al¬ 
bany St., Edinburgh EH1 3QH, Scotland, 
phone 44 (31) 557-2478, fax 44 (31) 557-5749. 


Computational Science in Industry and the 
Comprehensive Univ.: Nov. 8-10, 1990, Po¬ 
mona, Calif. Sponsor: Calif. State Polytechnic 
Univ. at Pomona. Submit abstract by July 30, 
1990, to Bruce P. Hillam, Computer Science 
Dept., Calif. State Polytechnic Univ., 3801 W. 
Temple Ave., Pomona, CA 91768, phone (714) 
869-3440. 

Int'l J. Computer-Aided VLSI Design plans a 
special early-1991 issue on VLSI testing. Pub¬ 
lisher: Ablex. Submit five copies of complete 
manuscript by July 31, 1990, to Sunil R. Das, 
Electrical Engineering Dept., Faculty of Engi¬ 
neering, Univ. of Ottawa, Ottawa, Ont., Can¬ 
ada KIN 6N5, phone (613) 564-3374, fax 
(613) 564-7681, e-mail das@uotelg01 or Bit- 
net srdpb@uottawa. 


Second European Distributed Memory 
Computing Conf.: Apr. 22-24, 1991, Munich, 
West Germany. Cosponsors: Gesellschaft fur 
Informatik et al. Submit paper by Aug. 15, 
1990, to Arndt Bode, Computer Science, Tech- 
nische Univ. Munich, POB 20-24-20, D-8000 
Munich 2, Federal Republic of Germany, 
phone 49 (89) 2105-8240, e-mail bode@ 
infovax.informatik.tu-muenchen.dbp.de. 

Int’l Workshop on Network and Operating 
System Support for Digital Audio and 
Video: Nov. 8-9, 1990, Berkeley, Calif. Spon¬ 
sor: Int’l Computer Science Inst. Submit ab¬ 
stract by Aug. 15, 1990, to Ramesh Govindan, 
ICSI, 1947 Center St., Suite 600, Berkeley, CA 
94704-1105, phone (415) 642-4274, ext. 136, 
e-mail av-workshop@berkeley.edu. 


Workshop on Parallel and Distributed 
Simulation: Jan. 21-23, 1991, Anaheim, 

Calif. Submit six copies of full paper by July 
31, 1990, to Richard M. Fujimoto, School of In¬ 
formation and Computer Science, Georgia 
Inst, of Technology, Atlanta, GA 30332, phone 
(404) 853-9384, e-mail fujimoto@prism. 
gatech.edu. 

SEARCC 90, South East Asia Regional 
Computer Confederation Conf.: Dec. 4-8, 
1990, Manila. Sponsor: Philippine Computer 
Society. Submit draft by July 31, 1990, to Vic¬ 
tor B. Gruet, Computer Information Systems, 
CIS Bldg., Meralco Compound, Ortigas Ave., 
1602 Pasig, Metro Manila, Philippines, phone 
63 (2) 722-1251, fax 63 (2) 722-0141. 


ASPLOS 4, Fourth Int’l Conf. on 
NS? Architectural Support for Program¬ 
ming Languages and Operating Systems: 
Apr. 8-11, 1991, Santa Clara, Calif. Sponsor: 
ACM. Submit seven copies of paper by Aug. 1, 
1990, to Dave Patterson, Computer Science 
Division, 571 Evans Hall, Univ. of California, 
Berkeley, CA 94720, fax (415) 642-5775, e- 
mail patterson@ginger.berkeley.edu. 

(£j)k IEEE Infocom 91, Conf. on Computer 
vl? Communications: Apr. 7-11, 1991, Mi¬ 
ami, Fla. Cosponsors: IEEE Computer and 
Communications Societies. Submit five cop¬ 
ies of full paper by Aug. 1,1990, to N. Sha- 
cham, IEEE Infocom 91, SRI Int’l, 333 Raven- 
swood Ave., Menlo Park, CA 94025, phone 
(415) 859-5710, e-mail shacham@sri.com. 


Int’l Workshop on Unix-Based Software 
Development Environments: Jan. 16-18, 
1991, Dallas, Texas. Sponsor: Usenix Assoc. 
Submit position paper by Aug. 1, 1990, to 
Stuart Feldman, Bellcore, 445 South St., Mor¬ 
ristown, NJ 07962-1910; or Noboru Akima, 
Sigma Project, Fifth Akihahara Sanwa Bank 
Bldg., Chiyoda-ku, Tokyo, Japan 101. 


IEEE Software plans a special issue in 
Ng? March 1991 on testing and debugging. 
The issue will review the status of the two areas 
and present state-of-the-art techniques. Sub¬ 
mit eight copies of article by Aug. 15,1990, to 


Int’l J. Computer-Aided VLSI Design plans a 
special issue on VLSI/systolic arrays. Pub¬ 
lisher: Ablex. Submit five copies of full papers 
by Aug. 30, 1990, to Bijan Karimi, Electrical 
and Computer Engineering Dept., Univ. of 
New Haven, West Haven, CT 06516, phone 
(203) 932-7164. 


| £2^ | ETC 91, 1991 European Test Conf.: 

N5? Apr. 17-19, 1991, Munich, West Ger¬ 
many. Sponsor: VDE (Zentralstelle Tagungen 
und Seminare). Submit four copies of abstract 
or full paper by Aug. 31,1990, to ETC 91, c/o 
Bennetts Associates, Burridge Farm, Bur- 
ridge, Southampton S03 7BY, UK, fax (44) 
489-579519. 


CAIA 91, Seventh IEEE Conf. on Arti- 
N§? ficial Intelligence Applications: Feb. 
24-28, 1991, Miami Beach, Fla. Submit paper 
by Aug. 31,1990, to Tim Finin, Center for Ad¬ 
vanced Information Technology, Unisys, 70E 
Swedesford Rd., PO Box 517, Paoli, PA 19301, 
phone (215) 648-2840, fax (215) 648-2288, e- 
mail finin@prc.unisys.com. 


IEEE Trans. Reliability plans a special issue 
on design for reliability of telecommunication 
systems and services. Submit author letter of 
commitment (including brief paper descrip¬ 
tion) by Sept. 1, 1990, and six copies of the 
manuscript by Nov. 15, 1990, to Andrew 
Reibman, AT&T Bell Labs, Rm. 2L-518, 
Holmdel, NJ 07733, phone (201) 949-1930, 
fax (201) 949-7724, e-mail alr@hoqaa.att. 
com; or C.S. Raghavendra, EE-Systems 
Dept., SAL 300, Univ. of Southern California, 
Los Angeles, CA 90089, phone (213) 743- 
5532, fax (213) 745-7284, e-mail raghu@ 
surya.usc.edu. 


Electrosoft plans a special issue on software 
for system transient modeling. Publisher: 
Computational Mechanics Publications. Sub¬ 
mit paper by Sept. 2,1990, to H.W. Dommel, 
Electrical Engineering Dept., Univ. of British 
Columbia, 2356 Main Hall, Vancouver, B.C., 
Canada V6T 1W5, phone (604) 228-2793. 


CG Int’l 91: June 22-28, 1991, Cambridge, 
Mass. Cosponsors: Computer Graphics Soci¬ 
ety, MIT. Submit six copies of summary by 
Sept. 4,1990, and six copies of full paper by 
Nov. 5, 1990, to N.M. Patrikalakis, MIT Rm. 5- 
428, 77 Massachusetts Ave., Cambridge, MA 
02139, phone (617) 253-4555, fax (617) 253- 
8125, e-mail nmp@deslab.mit.edu. 

Auto Carto 10, 10th Int’l Symp. on Auto¬ 
mated Cartography: Mar. 25-28, 1991. Co¬ 
sponsors: American Cartographic Assoc, et al. 
Submit five copies of full draft paper by Sept. 
7, 1990, to Auto Carto 10, Geography Dept., 
105 Wilkeson, North Campus, State Univ. of 
New York at Buffalo, Amherst, NY 14260, 
phone (716) 636-2545, fax (716) 636-2329. 

ICSE 13, 13th Int’l Conf. on Software 

Engineering: May 13-16, 1991, Austin, 
Texas. Cosponsor: ACM. Submit eight copies 
of paper by Sept. 14, 1990, to David Barstow, 
Schlumberger Lab for Computer Science, PO 
Box 200015, Austin, TX 78720-0015. 


First IEEE Int’l Workshop on Inter- 
nS? operability in Multidatabase Systems: 
Apr. 8-9, 1991, Kyoto, Japan. Submit seven 
copies of extended abstract by Sept. 15, 1990, 
to Marek Rusinkiewicz, Univ. of Houston, 


Computer Science Dept., Houston, TX 77204- 
3475, phone (713) 749-4791, e-mail marek@ 
cs.uh.edu; or Yahiko Kambayashi, Kyushu 
Univ., Computer Science amd Computer Engi¬ 
neering Dept., Hakozaki, Fukuoka 812, Japan, 
fax 81 (92) 641-1825, e-mail yahiko@csce. 
kyushu-u.ac.jp. 


CCW 91, Third IEEE Conf. on Com- 
NS? puter Workstations: May 15-17, 1991, 
Cape Cod, Mass. Sponsor: IEEE Computer So¬ 
ciety Technical Committee on Operating Sys¬ 
tems. Submit five copies of paper by Sept. 15, 
1990, to Keith Marzullo, Computer Science 
Dept., Upson Hall, Cornell Univ., Ithaca, NY 
14853. 


Second Int’l Symp. on Database Sys- 
N5? terns for Advanced Applications: Apr. 
2-4, 1991, Tokyo. Sponsor: Information Pro¬ 
cessing Society of Japan. Submit three copies 
of full paper by Sept. 15,1990, to Akifuimi 
Makinouchi, Computer Science and Commu¬ 
nication Engineering Dept., Kyushu Univ., 
Hakozaki 6-10-1, Fukuoka 812, Japan, phone 
81 (92) 641-1101, ext. 6055, fax 81 (92)641- 
1101, ext. 5418, e-mail akifumi@vax88.csce. 
kyushu-u.ac.jp. 

IEEE J. Solid-State Circuits plans a series of 
special issues on microelectronics systems. 
Submit five copies of complete paper by Sept. 
15, 1990, to Donald W. Bouldin, Electrical and 
Computer Engineering, Univ. of Tennessee, 
Knoxville, TN 37996-2100, phone (615) 974- 
5444, fax (615) 974-5492, e-mail bouldin@ 
sunl.engr.utk.edu. 

RTA 91, Fourth Int’l Conf. on Rewriting 
Techniques and Applications: Apr. 10-12, 
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1991, Como, Italy. Sponsors: State Univ. of 
Milan. Submit 10 copies of extended abstract 
or full paper by Sept. 15, 1990, to Ronald V. 
Book, Theoretische Informatik, Inst, fur Infor- 
matik, Univ. Wurzburg, Am Hubland, D-8700 
Wurzburg, West Germany, US phone (805) 
961-2778, e-mail book%henri@hub.ucsb. 
edu. 

Fifth Int’l Parallel Processing Symp.: Mar. 
27-29, 1991, Newport Beach, Calif. Submit 
four copies of complete paper or 1,000-word 
summary by Sept. 15, 1990, to V.K. Prasanna 
Kumar, Electrical Engineering-Systems 
Dept., SAL 344, Univ. of Southern California, 
Los Angeles, CA 90089-0781, phone (213) 
743-5236, fax (213) 745-7284, e-mail ipps@ 
ashoka.usc.edu. 

24th Computer Simulation Conf.: Apr. 1-5, 
1991, New Orleans. Sponsor: Society for Com¬ 
puter Simulation. Submit abstract by Sept. 15, 
1990, and full paper by Dec. 1,1990, to George 
W. Zobrist, Computer Science Dept., Univ. of 
Missouri at Rolla, Rolla, MO 65436, phone 
(314) 341-4836, e-mail c2816@umrvmb.umr. 


1991 IEEE Int’l Conf. on Robotics and Auto¬ 
mation: Apr. 7-12, 1991, Sacramento, Calif. 
Sponsor: IEEE Robotics and Automation Soci¬ 
ety. Submit four copies of paper by Sept. 16, 
1990, to T.J. Tam, Systems Science and Math¬ 
ematics, Campus Box 1040, Washington 
Univ., St. Louis, MO 63130. 


OTM Fourth Int’l Conf. on Industrial and 

Engineering Applications of Artificial 
Intelligence and Expert Systems: June 2-5, 
1991, Kauai, Hawaii. Sponsors: ACM et al. 
Submit four copies of extended abstract by 
Oct. 1, 1990, to Jim Bezdek, Computer Science 
Div., Univ. of West Florida, Pensacola, FL 
32514, phone (904) 474-2784, fax (904) 474- 
2096, e-mail jbezdek@uwf.bitnet. 

CHI 91, 1991 Conf. on Human Factors 

in Computing Systems: Apr. 28-May 2, 
1991, New Orleans. Sponsor: ACM. Submit 
six copies of abstract/paper by Oct. 1, 1990, to 
Peter Poison, Psychology Dept., Univ. of 
Colorado, Muenzinger Hall, Campus Box 345, 
Boulder, CO 80309-0345, phone (303) 492- 
5622, e-mail ppolson@clipr.colorado.edu. 


1991 IEEE Int’l Symp. on Information The¬ 
ory: June 23-29, 1991, Budapest, Hungary. 
Submit short paper by Oct. I, 1990, and long 
paper by Nov. 1, 1990, to Anthony Ephrem- 
ides, Electrical Engineering Dept., Univ. of 
Maryland, College Park, MD 20742, phone 
(301) 454-6871, e-mail tony@eng.umd.edu. 


ISCAS 91, 24th IEEE Int’l Symp. on Cir¬ 
cuits and Systems: June 11-14, 1991, Singa¬ 
pore. Sponsor: IEEE Circuits and Systems So¬ 
ciety. Submit six copies of summary by Oct. 1, 
1990, to Technical Program Chair, ISCAS 91 
Secretariat, Communication Int’l Associates, 
44/46 Tanjong Pagar Rd., Singapore 0208, 
phone (65) 226-2823, fax (65) 226-2877. 


Fifth Int’l Workshop on High-Level Synthe¬ 
sis: Mar. 3-6, 1991, Buhlerhohe, West Ger¬ 
many. Cosponsors: IEEE et al. Submit 12 cop¬ 
ies of extended summary by Oct. 8,1990, to 


Wolfgang Rosenstiel, Forschungszentrum In¬ 
formatik an der Univ. Karlsruhe, Haid-und- 
Neu Strasse 10-14, D-7500 Karlsruhe, Ger¬ 
many, phone 49 (721) 6906-81, fax 49 (721) 
6906-88, e-mail rosenstiel@ira.uka.de. 


Symp. on Solid Modeling Foundations and 
CAD/CAM Applications: June 5-7, 1991, 
Austin, Texas. Sponsor: ACM SIGGraph. Sub¬ 
mit abstract by Oct. 15,1990, and five copies 
of full paper by Nov. 30, 1990, to Jaroslaw Ros- 
signac, J2-C03, IBM T.J. Watson Research 
Center, PO Box 704, Yorktown Heights, NY 
10598, phone (914) 784-7630, fax (914) 784- 
7455, e-mail jarek@ibm.com. 

ICDCS 91, 11th Int’l Conf. on Distrib- 

uted Computing Systems: May 20-24, 
1991, Arlington, Texas. Submit five copies of 
abstract and paper by Oct. 23, 1990, to Benja¬ 
min W. Wah, ICDCS 91, Coordinated Science 
Lab, MC228, Univ. of Illinois, 1101 W. Spring 
field Ave., Urbana, IL 61801-3082, phone 
(217) 333-3516, fax (217) 244-1764, e-mail 
wah7 o aquinas@uxc.cso. uiuc.edu. 


Advanced Research in VLSI Conf.: Mar. 25- 
27, 1991, Santa Cruz, Calif. Submit five copies 
of draft paper by Nov. 1,1990, to Carlo H. Se¬ 
quin, Univ. of California, CS Div., 529B Evans 
Hall, Berkeley, CA 94720. 


SCM 3, Third Int’l Workshop on Soft- 
v!?' ware Configuration Management: 

June 12-14, 1991, Trondheim, Norway. Co¬ 
sponsors: ACM et al. Submit four copies of po¬ 
sition paper and full paper by Nov. 15,1990, to 
Peter Feiler, Software Engineering Inst., Car¬ 
negie Mellon Univ., Pittsburgh, PA 15213- 
3890, phone (412) 268-7790, e-mail phf@sei. 


Third Symp. on Integrated Ferroelectrics: 

Apr. 3-5, 1991, Colorado Springs, Colo. Sub¬ 
mit abstract by Nov. 15, 1990, to Conf. Secre¬ 
tary, Microelectronics Research Lab, Univ. of 
Colorado at Colorado Springs, PO Box 7150, 
Colorado Springs, CO 80933-7150, phone 
(719) 593-3488, fax (719) 594-4257. 

PARLE 91, Conf. on Parallel Architectures 
and Languages Europe: June 10-13, 1991, 
Eindhoven, The Netherlands. Cosponsors: 
Commission of European Communities et al. 
Submit paper by Nov. 15, 1990, to F. Stoots, 
Philips Research Labs, PO Box 80.000, 5600 
JA Eindhoven, The Netherlands, fax 31 (40) 
744-758, e-mail stoots@dooma.prl.philips.nl. 

First Int’l Conf. on Artificial Intelligence in 
Design: June 25-27, 1991, Edinburgh, Scot¬ 
land. Submit full paper by Nov. 16, 1990, to 
John Gero, Architectural and Design Science 
Dept., Univ. of Sydney, NSW 2006, Australia, 
phone 61 (2) 692-2328, fax 61 (2) 692-3031, e- 
mail john@archsci.arch.su.oz or john% 
archsci.su.oz@uunet.uu.net. 

ESEC 91, Third European Software Engi¬ 
neering Conf.: Oct. 21-24, 1991, Milano, It¬ 
aly. Sponsors: AFCET et al. Submit six copies 
of full paper and abstract by Jan. 15,1991, to 
Alex van Lamsweerde, Unite d’lnformatique, 
Univ. Catholique de Louvain, Place Sainte 
Barbe 2, B-1348 Louvain-La-Neuve, Bel¬ 
gium, e-mail esec@info.ucl.ac.be. 



July 1990 


Int’l Workshop on Semantics for Concur¬ 
rency, July 23-25, Leicester, UK. Sponsor: 
British Computer Society. Contact Marta 
Kwiatowska, Workshop on Semantics for 
Concurrency, Computing Studies Dept., Univ. 
of Leicester, Leicester LEI 7RH, UK, phone 
(44) 533-523603. 

Int’l Workshop on Principles of Diagnosis, 
July 23-25, Menlo Park, Calif. Cosponsors: 
American Assoc, for Artificial Intelligence, 
Price Waterhouse. Contact Walter Hamscher, 
Price Waterhouse Technology Center, 68 Wil¬ 
low Rd., Menlo Park, CA 94025, phone (415) 
688-6669, e-mail hamscher@pw.com or 
wch@ai.ai.mit.edu. 

10th Int’l Conf. in Computer Science, July 

23-27, Santiago, Chile. Contact Joachim von 
zur Gathen, Computer Science Dept., Univ. of 
Toronto, 10 King’s College Rd., Toronto, Can¬ 
ada M5S 1A4, phone (416) 978-6024, e-mail 
gathen@theory.toronto.edu. 

DIAC 90, Directions and Implications of 
Advanced Computing, July 28, Boston. 
Sponsor: Computer Professionals for Social 
Responsibility. Contact Douglas Schuler, 
Boeing Computer Services, MS 7L-64, PO 
24346, Seattle, WA 98124-0346, phone (206) 
634-2771. 

AAAI 90, Nat’l Conf. on Artificial Intelli¬ 
gence, July 29-Aug. 3, Boston. Sponsor: 
American Assoc, for Artificial Intelligence. 
Contact AAAI, 445 Burgess Dr., Menlo Park, 
CA 94025, phone (415) 328-3123, fax (415) 
321-4457; Edward Lafferty, Al Center, Mitre, 
MS A350, Burlington Rd., Bedford, MA 
01730, phone (617) 271-2773; or Marcel 
Schoppers, Advanced Decision Systems, 1500 
Plymouth St., Mountain View, CA 94043, 
phone (415) 960-7553, e-mail marcel@ads. 


August 1990 


| SIGGraph 90, 17th Conf. on Com¬ 
puter Graphics and Interactive Tech¬ 
niques, Aug. 6-10, Dallas. Sponsor: ACM. 
Contact Assoc, for Computing Machinery, 11 
W. 42nd St., New York, NY 10036, phone 
(212) 869-7440; or SIGGraph 90, 111 E. 
Wacker Dr., Suite 600, Chicago, IL 60601, fax 
(312) 938-1232. 


SIGDA Workshop on Logic-Level Modeling 
for ASICs, Aug. 12-14, Monterey, Calif. Con¬ 
tact Colleen Matteis, 990 W. Taylor St., San 
Jose, CA 95126. 
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In the accompanying Calendar, the IEEE Computer Society logo identifies 
the conferences the society is sponsoring or participating in. Other confer¬ 
ences of interest to our readers, as well as their sponsors, are also listed. 

For inclusion in Call for Papers or Calendar, submit information at least six 
weeks before the month of publication (i.e., for the September 1990 issue, send 
information for receipt by July 15, 1990) to Chuck Governale, Calendar Dept., 
Computer, PO Box 3014, Los Alamitos, CA 90720-1264. 


Eighth Conf. on University Programs in 
Computer-Aided Engineering, Design and 
Manufacturing, Aug. 12-15, Ann Arbor. 
Mich. Contact Continuing Engineering Educa¬ 
tion, 300 Chrysler Center, North Campus. 

Univ. of Michigan, Ann Arbor, MI 48109, 
phone (313) 764-8490, fax (313) 936-0253. 

| ^f^| VHDL Methods Workshop, Aug. 13- 

15 , Charlottesville, Va. Contact Ron 
Waxman, Univ. of Virginia, Thornton Hall, 
Charlottesville, VA 22903, phone (804) 924- 
6086, fax (804) 924-8818, e-mail ronw@ 
virginia.edu. 

16th Int’l Conf. on Very Large Data 
Bases, Aug. 13-16, Brisbane, Australia. 
Contact David Reiner, Lotus Development, 1 
Canal Park, Cambridge, MA 02141, phone 
(617) 577-8500, e-mail dreiner@lotus.com. 


ICPP 90, 19th Int’l Conf. on Parallel Pro¬ 
cessing, Aug. 13-17, St. Charles, Ill. Sponsor: 
Pennsylvania State Univ. Contact Benjamin 
W. Wah, Coordinated Science Lab., Univ. of 
Illinois, 1101 W. Springfield, Ave., Urbana, IL 
61801-2082, phone (217) 333-3516; or Tse- 
yun Feng, EE East Bldg., Pennsylvania State 
Univ., University Park, PA 16802, phone (814) 
863-1469. 

Fifth Ada Software Engineering, Educa¬ 
tion, and Training Symp., Aug. 14-16, Alex¬ 
andria, Va. Sponsor: Ada Software Engineer¬ 
ing Education and Training Team. Contact 
Catherine W. McDonald, Inst, of Defense 
Analysis, 1801 N. Beauregard St., Alexandria, 
VA 22311, phone (703) 824-5531, e-mail 
mcdonald@ida.org or mcdonald@ajpo.sei. 


Second Software Quality Workshop, Aug. 
14-16, Rochester, N.Y. Sponsor: Rome Air 
Development Center. Contact Barbara 
Radzisz, Data and Analysis Center for Soft¬ 
ware, PO Box 120, Utica, NY 13503, phone 
(315) 336-0937. 


K. Brayton, Electrical Engineering and Com¬ 
puter Science Dept., Univ. of California at 
Berkeley, Berkeley, CA 94760. 

Int'l Symp. on Algorithms, Aug. 16-18, To¬ 
kyo. Sponsor: Information Processing Society 
of Japan Special Interest Group on Algorithms. 
Contact Tetsuo Asano, Osaka Electro-Com¬ 
munication Univ., Hatsu-cho, Neyagawa, 
Osaka 572, Japan, phone 81 (720) 24-1131. 

UP ADI 90, 21st Convention of the Pan 
American Federation of Engineering Socie¬ 
ties, Aug. 19-24, Washington, DC. Cospon¬ 
sors: American Assoc, of Engineering Socie¬ 
ties, American Society of Civil Engineers. 
Contact UPADI90, ASCE, 345 E. 47th St., 

New York, NY 10017, phone (212) 705-7218. 

|£ji| Hot Chips II, Symp. on High-Perfor- 
mance Chips, Aug. 20-21, Santa Clara, 
Calif. Sponsor: IEEE Computer Society Tech¬ 
nical Committee on Microprocessors and Mi¬ 
crocomputers. Contact Hasan S. Alkhatib, 
EECS Dept., Santa Clara Univ., Santa Clara, 
CA 95053, phone (408) 554-4485, fax (408) 
554-5475, e-mail halkhatib@scu.edu. 

Second Int’l Joint Conf. of ISSAC 90 (1990 
Int’l Symp. on Symbolic and Algebraic 
Computation) and AAECC 8 (Eighth Int’l 
Conf. on Applied Algebra, Algebraic Algo¬ 
rithms, and Error-Correcting Codes), Aug. 
20-24, Tokyo. Cosponsors: ACM et al. Contact 
Conf. Secretariat, IJC-2, Scientist, Inc., Yama- 
zaki Bldg., 3-2 Kanda Suruga-dai, Chiyoda- 
ku, Tokyo 101, Japan. 

Coling 90, 13th Int’l Conf. on Computa¬ 
tional Linguistics, Aug. 20-25, Helsinki, Fin¬ 
land. Contact Hans Karlgren, KVAL, Skepps- 
bron 26, S-l 11 30 Stockholm, Sweden, phone 
46 (8) 789-6683. 


September 1990 


TAU 90, 1990 Int’l Workshop on Timing Is¬ 
sues in the Specification and Synthesis of ( ffj) ISPRS Commission V Symp., Close- 

Digital Systems, Aug. 15-17, Vancouver, ^ 4 ? Range Photogrammetry Meets Ma- 

B.C., Canada. Sponsor: ACM. Contact Robert chine Vision, Sept. 3-7, Zurich. Cosponsor: 


Int’l Society for Photogrammetry and Remote 
Sensing et al. Contact Annin Gruen, Inst, of 
Geodesy and Photogrammetry, ETH-Hoeng- 
gerberg, CH-8093, Zurich, Switzerland, 
phone 41 (1) 377-3051. 


EuroVHDL 90, First European Work- 
ing Conf. on VHDL Methods, Sept. 4- 

7, Marseille, France. Cosponsors: ACM et al. 
Contact Petra Michel, Siemens, A.G. Dept. 
ZFEISEA1, Otto Hahn Ring 6, Munich 83, 
West Germany. 


ASAP 90, Int’l Conf. on Application- 
Specific Array Processors, Sept. 5-7, 
Princeton, N.J. Cosponsor: Princeton Univ. 
Contact S.Y. Rung, Electrical Engineering 
Dept., Princeton Univ., Princeton, NJ 08544, 
phone (609) 258-3780. 

13th Int’l ACM/SIGIR Conf. on Research 
and Development in Information Retrieval, 
Sept. 5-7, Brussels. Contact Jean-Luc Vidick, 
Univ. Libre de Bruxelles, Avenue F.D. Roose¬ 
velt, Infodoc, CP142, 1050 Brussels, Belgium. 


Int’l Workshop on VLSI for Artificial Intel¬ 
ligence and Neural Networks, Sept. 5-7, Ox¬ 
ford, England. Contact Jose G. Delgado-Frias, 
Electrical Engineering Dept., SUNY, Bing¬ 
hamton, NY 13901, phone (607) 777-4806, e- 
mail delgado@bingvaxu.cc.binghamton.edu. 

1990 Int’l Electronics Packaging Conf., 
Sept. 9-13, Marlborough, Mass. Sponsor: Int’l 
Electronics Packaging Society. Contact IEPS, 
114 N. Hale St., Wheaton, IL 60187, phone 
(708) 260-1044. 


Workshop on Computers in Systematic Bio¬ 
logy, Sept. 9-14, Davis, Calif. Sponsor: Nat’l 
Science Foundation. Contact Renaud Fortun- 
er, California Dept, of Food and Agriculture, 
Analysis and Identification, Rm. 340, PO Box 
942871, Sacramento, CA 94271-0001, phone 
(916) 445-4521. 


ITC 90, Int’l Test Conf., Sept. 10-12, 

Washington, DC. Cosponsor: IEEE 
Philadelphia Section. Contact Donald Den- 
burg, AT&T Bell Labs, 1247 S. Cedar Crest 
Blvd., Allentown, PA 18103; or ITC, 1201 
Sussex Turnpike, Suite 101, PO Box 264, Mt. 
Freedom, NJ 07970, phone (201) 895-5260. 


IEEE Conf. on Managing Expert Sys- 
tern Programs and Projects, Sept. 10- 

12, Washington, DC. Sponsor: IEEE Com¬ 
puter Society Technical Committee on Expert 
Systems. Contact Jay Liebowitz, Management 
Sciences Dept., George Washington Univ., 
Washington, DC, phone (202) 994-6969. 


Second Int’l Workshop on Advances in Ro¬ 
bot Kinematics, Sept. 10-12, Linz, Austria. 
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Sponsors: Research Inst, for Symbolic Com¬ 
putation et al. Contact Sabine Stifler, RISC, 
Johannes Kepler Univ., A-4040 Linz, Austria, 
phone 43 (7236) 3231-50; or Jadran Lenarcic, 
Josef Stefan Inst., Univ. of Edvard Kardelj, 
Jamova 39, 61111 Ljubljana, Yugoslavia, 
phone 38 (61) 214-399. 


Symp. on Object-Oriented Programming 
Emphasizing Practical Applications, Sept. 
14-15, Poughkeepsie, N.Y. Sponsor: Marist 
College. Contact James TenEyck, Marist Col¬ 
lege, Poughkeepsie, NY 12601-1387, phone 
(914) 471-3240, e-mail jzbv@maristb.bitnet. 

/gjj ICCD 90, IEEE Int’l Conf. on Com- 
puter Design: VLSI in Computers and 
Processors, Sept. 16-19, Cambridge, Mass. 
Contact Edward M. Middlesworth, Hewlett- 
Packard, Bldg. 25U, PO Box 10350, Palo Alto, 
CA 94303-0867, phone (415) 857-5485; or 
ICCD 90, IEEE Computer Society, 1730 Mas¬ 
sachusetts Ave. NW, Washington, DC 20036- 
1903, phone (202) 371-1013. 


Fourth Digital Signal Processing Work¬ 
shop, Sept. 16-19, New Paltz, N.Y. Sponsor: 
IEEE Signal Processing Society. Contact K.S. 
Arun, Coordinated Science Lab, Univ. of Illi¬ 
nois at Urbana-Champaign, 1101 W. Spring- 
field Ave., Urbana, IL 61801, phone (217) 333- 
7678, fax (217) 244-1764. 


Internal Audit Advanced Technology Fo¬ 
rum, Sept. 17-19, Orlando, Fla. Sponsor: Inst, 
of Internal Auditors. Contact Stephen M. Par- 
oby, Ernst and Young, 787 Seventh Ave., New 
York, NY 10019, phone (212) 830-6000. 


(£K) ASIC 90, Third IEEE ASIC Seminar 
and Exhibit, Sept. 17-21, Rochester, 
N.Y. Cosponsors: IEEE Rochester Section, 
ACM. Contact Kenneth Hsu, Rochester Inst, of 
Technology, Computer Engineering Dept., 
Rochester, NY 14623, phone (716) 475-2655; 
or Lynne Engelbrecht, 170 Mt. Read Blvd., 
Rochester, NY 14611, phone (716) 328-2310, 
fax (716) 436-9370. 


Technical Univ. of Vienna, Applied Computer 
Science Dept., CD Lab for Expert Systems, 
Paniglgasse 16, 1040 Vienna, Austria, fax 43 
(222) 505-5304, e-mail nejdl@vexpert.at. 

Tencon 90, IEEE Region 10 Conf. on Com¬ 
puter and Communication Systems, Sept. 
24-27, Hong Kong. Cosponsor: IEEE Hong 
Kong Section. Contact Y.S. Cheung, Electrical 
and Electronic Engineering Dept., Univ. of 
Hong Kong, Pokfulam, Hong Kong. 

SIGComm 90, Sept. 24-27, Philadelphia. 
Sponsor: ACM SIGComm. Contact David Far- 
ber, Univ. of Pennsylvania, 200 S. 33rd St., 
Philadelphia, PA 19104-6389, phone (215) 
898-9508, fax (215) 898-0587, e-mail farber@ 
cis.upenn.edu; or Phil Kam, Bell Communica¬ 
tions Research, MS 2P-357, 445 South St., PO 
Box 1910, Morristown, NJ 07962-1910, phone 
(201) 829-4299. 

AIRIES 90, AI Research in the Environ¬ 
mental Sciences Workshop, Sept. 25-27, 

Montreal, Que., Canada. Cosponsors: Univ. of 
Quebec at Montreal, Centre Researche Infor- 
matique de Montreal. Contact Rosemary M. 
Dyer, GL/LYP, AIRIES 90, Air Force Geo¬ 
physics Lab, Hanscom Air Force Base, MA 
01731, fax (617) 377-4498. 


Infojapan 90, Int'l Conf. on Informa- 
tion Technology, Oct. 1-5, Tokyo. 
Sponsor: Information Processing Society of 
Japan. Contact Infojapan 90 Secretariat, c/o 
Simul Int’l, Kowa Bldg. No. 9, 1-8-10 Aka- 
saka, Minato-ku, Tokyo 107, Japan, phone 81 
(3) 586-8691, fax 81 (3) 583-8336. 

Sixth Int’l Conf. on the Application of 
Standards for Open Systems Intercon¬ 
nection, Oct. 2-4, Gaithersburg, Md. Cospon¬ 
sor: Nat’l Inst, of Standards and Technology. 
Contact Brenda Gray, NIST/OSI, Rm. B217, 
Bldg. 225, Gaithersburg, MD 20899, phone 
(301) 975-3664. 


28th Allerton Conf. on Communication, 
Control, and Computing, Oct. 3-5, Mon- 
ticello. Ill. Contact Allerton Conf., c/o Donna 
J. Brown, Univ. of Illinois at Urbana-Cham¬ 
paign, Coordinated Science Lab, 1101 W. 
Springfield, Ave., Urbana, IL 61801, phone 
(217) 244-0581, e-mail djb@uicsl.csl.uiuc. 


1990 Workshop on Visual Languages, 
Oct. 4-6, Skokie, Ill. Sponsors: Univ. of 
Pittsburgh et al. Contact S.K. Chang, Com¬ 
puter Science Dept., Univ. of Pittsburgh, Pitts¬ 
burgh, PA 15260. 


Fourth Conf. on Putting Methods and Tools 
into Practice as Aids to Design Information 
Systems, Sept. 25-27, Nantes, France. Spon¬ 
sor: Univ. de Nantes, Inst. Univ. de Technolo- 
gie, Lab. dTnformatique, Liana. Contact H. 
Habrias, 3 Rue du Marechal Joffre, 44041 Nan¬ 
tes Cedex 01, France, phone (33) 4030-6090, 
fax (33) 4030-6001. 


Frontiers 90, Third Symp. on Fron¬ 
tiers of Massively Parallel Computa¬ 
tion, Oct. 8-10, College Park, Md. Cospon¬ 
sors: Nat’l IEEE Capital Area Chapter, NASA 
Goddard Space Flight Center. Contact Johan¬ 
na Weinstein, Frontiers 90, UMIACS, Univ. of 
Maryland, A.V. Williams Bldg., College Park, 
MD 20742, phone (301) 454-1808. 


Cl 90, 1990 Int’l Symp. on Computa¬ 
tional Intelligence, Sept. 27-29, Mi¬ 
lano, Italy. Sponsors: ACM, F.I.S. Cassa di 
Rosp. o. PC. Contact Giorgio Valle, Universita 
Milano. Dip. Scienze Della Informazione, Via 
Moretto 20133, Milano, Italy, phone 39 (2) 
757-5228, fax 39 (2) 761-10556, e-mail 
valle@imiucca.bitnet. 


Third UNB Artificial Intelligence Work¬ 
shop, Oct. 9, Fredericton, N.B., Canada. Spon¬ 
sor: Univ. of New Brunswick. Contact B.G. 
Nickerson, School of Computer Science, Univ. 
of New Brunswick, PO Box 4400, Fredericton, 
N.B., Canada E3B 5A3, phone (506) 453- 
4566, fax (506) 453-3566, e-mail bgn@unb. 


ep 90 > Electronic Publishing 90, Sept. 
18-20, Gaithersburg, Md. Sponsor: Nat’l 
Inst, of Standards and Technology. Contact 
Peter R. King, Computer Science Dept., Univ. 
of Manitoba, Winnipeg, Man., Canada R3T 
2N2, phone (204) 474-9935. 


ICARCV 90, Int’l Conf. on Automation, 
Robotics, and Computer Vision, Sept. 18- 

21, Singapore. Cosponsors: IEEE Singapore 
Chapter et al. Contact Dinesh P. Mital, 
ICARCV 90, School of Electrical and Elec¬ 
tronic Engineering, Nanyang Technological 
Inst., Nanyang Ave., Singapore 2263, Repub¬ 
lic of Singapore, phone (65) 660-5399. 

Conf. on Multiuser Interfaces and Applica¬ 
tions, Sept. 24-26, Heraklion, Crete, Greece. 
Cosponsors: IFIP et al. Contact Rena Kalaitza- 
ki, Computer Science Dept., Univ. of Crete, 

GR 714-09 Heraklion, Crete, Greece, phone 30 
(81) 210-057. 


Int’l Workshop on Expert Systems in Engi¬ 
neering, Sept. 24-26, Vienna, Austria. Spon¬ 
sor: Christian Doppler Expert Systems Lab, 
Univ. of Vienna. Contact Wolfgang Nejdl, 


Future Trends 90, Workshop on Fu- 
ture Trends of Distributed Computing 
Systems, Sept. 30-Oct. 2, Cairo. Contact 
Stephen S. Yau, Univ. of Florida, CIS Dept., 
Rm. 301, Gainesville, FL 32611, phone (904) 
335-8006. 


October 1990 


15th Conf. on Local Computer 
Networking, Oct. 1-3, Minneapolis, 
Minn. Contact Marc Cohn, Advanced Devel¬ 
opment Div., Raychem Corp., 300 Constitu¬ 
tion Dr., Menlo Park, CA 94025-1164, phone 
(415) 361-3902, fax (415) 361-6099. 

Second Int’l Conf. on Algebraic and Logic 
Programming, Oct. 1-3, Nancy, France. Con¬ 
tact Wolfgang Wechler, TU Braunschweig, 
Theoretische Informatik, Postfach 3329, D- 
3300 Braunschweig, West Germany, e-mail 
wechler@infbs.uucp; or Helene Kirchner, 
CRIN, BP239, 54506 Vandoeuvre-les-Nancy 
Cedex, France. 


Ninth Symp. on Reliable Distributed 
Systems, Oct. 9-11, Huntsville, Ala. 
Contact Raif M. Yanney, TRW, MS DH2/ 
2328, 1 Space Park, Redondo Beach, CA 
90278, phone (213) 764-6033. 

Northcon 90, Oct. 9-11, Seattle. Cosponsors: 
IEEE et al. Contact Northcon 90 Professional 
Program Committee, c/o Ramona Baker, 8110 
Airport Blvd., Los Angeles, CA 90045-3194, 
phone (213) 215-3796, ext. 222. 

PDCS 90, ISMM Int’l Conf. on Parallel and 
Distributed Computing and Systems, Oct. 
10-12, New York City. Sponsor: Int’l Society 
for Mini and Microcomputers. Contact R. 
Ammar, U155, Computer Science and Engi¬ 
neering Dept., Univ. of Connecticut, Storrs, 
CT 06268, fax (203) 486-0318. 

EuroForum 90, Oct. 11-12, Daresbury, 
Cheshire, UK. Contact Kate Faulkner, Euro- 
Forum 90, ICL, Manchester M12 SDR, UK 
phone 44 (61) 223-1301, fax 44 (61) 223-1207. 

Second Int’l Conf. on Microelectronics, Oct. 
13-15, Damascus, Syria. Sponsor: Arab 
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School of Science and Technology. Contact 
M.I. Elmasry, VLSI Research Group, Univ. of 
Waterloo, Waterloo, Ont., Canada N2L 3G1, 
phone (519) 885-1211, ext. 3753. 

1990 Fall VHDL Users’ Group Meeting, 

Oct. 14-17, Oakland, Calif. Contact Rachel 
Rusting, Intermetrics, 733 Concord, Cam¬ 
bridge, MA 02138, phone (617) 661-1840. 

AIPR 19, Workshop on Applied Imagery 
Pattern Recognition, Oct. 17-19, McLean, 
Va. Sponsors: Society of Photooptical Instru¬ 
mentation Engineers, Rome Air Development 
Center. Contact Brian Mitchell, ERIM, PO 
Box 8618, Ann Arbor, MI 48106, phone (313) 
994-1200, ext. 2713. 

12th Saudi Nat’l Computer Conf. on Plan¬ 
ning for the Informatics Society, Oct. 21-24, 

Riyadh, Saudi Arabia. Cosponsors: King Saud 
Univ., Saudi Computer Society. Contact Mo¬ 
hammad M. Mandurah, College of Computer 
and Information Sciences, PO Box 51178, Ri¬ 
yadh, 11543, Saudi Arabia, phone 996 (1) 467- 
6993. 


OOPSLA 90, Fifth Conf. on Object-Ori¬ 
ented Programming Systems, Languages, 
and Applications, Oct. 21-2S, Ottawa, Can¬ 
ada. Sponsor: ACM. Contact Assoc, for Com¬ 
puting Machinery, 11 W. 42nd St., New York, 
NY 10036, phone (212) 869-7440. 


FOCS, 31st Foundations of Computer 
Science, Oct. 22-24, St. Louis, Mo. Con¬ 
tact Christos Papdimitriou, Computer Science 
Dept., Univ. of California at San Diego, La 
Jolla, CA 92093, phone (619) 534-2086. 


Int’l Conf. on Computer Applications in De¬ 
veloping Countries, Oct. 22-24, Benin City, 
Nigeria. Sponsor: Large Scale Systems Re¬ 
search Group, Univ. of Benin. Contact E.A. 
Onibere, Mathematics and Computer Science 
Dept., Univ. of Benin, P.M.B. 1154, Benin 
City, Nigeria. 


Ninth National Conf. on EDP System and 
Software Quality Assurance, Oct. 22-24, 

Washington, DC. Sponsor: Data Processing 
Management Assoc. Contact US Professional 
Development Inst., EDP System and Software 
Quality Assurance, 1734 Elton Rd., Suite 221, 
Silver Spring, MD 20903-1733, phone (301) 
445-4400, fax (301) 445-5722. 


JCIT 5, Fifth Jerusalem Conf. on In- 
formation Technology, Oct. 22-25, 

Jerusalem, Israel. Sponsor: Information Pro¬ 
cessing Assoc, of Israel. Contact Abraham 
Peled, IBM T.J. Watson Research Center, PO 
Box 704, Yorktown Heights, NY 10598. 


CC 90, Third Int’l Workshop on Compiler 
Compilers, Oct. 22-26, Schwerin, East Ger¬ 
many. Sponsors: German Democratic Repub¬ 
lic Academy of Sciences Inst, of Informatics 
and Computing Technique et al. Contact Mi¬ 
chael Albinus, CC 90 Organizing Committee, 
Akademie der Wissenschaften der DDR, Inst, 
fur Informatik und Rechentechnik, Rudower 
Chaussee 5, Berlin, GDR — 1199. 


Third Int’l Symp. on Artificial Intelligence, 
Oct. 22-26, Monterrey, N.L. Mexico. Spon¬ 


sors: ITESM (Inst. Tecnologico y de Estudios 
Superiores de Monterrey) et al. Contact Hugo 
Terashima, Centro de Inteligencia Artificial, 
ITESM, Sue. de Correos “J”, C.P. 64849 Mon¬ 
terrey, N.L. Mexico, phone 52 (83) 58-2000, 
fax 52 (83) 58-0771, e-mail isai@tecmtyvm. 
bitnet. 


Visualization 90, Oct. 23-26, San Fran- 
cisco. Contact Bruce Brown, Oracle 
Corp., 20 Davis Dr., Belmont, CA 94002, 
phone (415) 598-3628. 


Esorics 90, European Symp. on Research in 
Computer Security, Oct. 24-26, Toulouse, 
France. Sponsor: AFCET. Contact Martin 
Gilles, 16 Para de Diane, 78350 Jouy eu Josas, 
Toulouse Cedex, France. 


First Japanese Knowledge Acquisition for 
Knowledge-Based Systems Workshop, Oct. 
25-26, Kyoto, Japan, and Oct. 29-31, Tokyo. 
Cosponsors: Kansai Inst, of Information Sys¬ 
tems et al. Contact John H. Boose, Advanced 
Technology Center, Boeing Computer Ser¬ 
vices 7L-64, PO Box 24346, Seattle, WA 
98124, phone (206) 865-3253. 


NACLP 90, 1990 North American 
sS? Conf. on Logic Programming, Oct. 28- 
Nov. 1, Austin, Texas. Cosponsor: ACM. Con¬ 
tact Carlo Zaniolo, MCC, 3500 W. Balcones 
Center Dr., Austin, TX 78759, phone (512) 
338-3442. 


Int’l Conf. on Information Technology, Oct. 
29-31, Bournemouth, UK. Sponsor: Institu¬ 
tion of Electrical Engineers. Contact Conf. 
Services, IEE, Savoy Place, London WC2R 
0BL, UK, phone 44 (71) 240-1871, fax 44 (71) 
240-7735. 


Eighth Pacific Northwest Software Quality 
Conf., Oct. 29-31, Portland, Ore. Sponsor: 
PNSQC Committee. Contact Terri Moore, Pa¬ 
cific Agenda, PO Box 10142, Portland, OR 
97210, phone (503) 223-8633. 

ISCIA 5, Fifth Int’l Symp. on Computer and 
Information Sciences, Oct. 30-Nov. 2, Cap¬ 
padocia, Nevsehir, Turkey. Sponsors: Istanbul 
Technical Univ. et al. Contact A. Emre Har- 
manci, Istanbul Technical Univ., Bilgi Islem 
Merkezi, Ayazaga, 80626 Istanbul, Turkey, 
phone 090 (1) 176-3254, fax 090 (1) 176-1734, 
e-mail harmanci@tritu.bitnet. 


/Qjk Compsac 90, 14th Int’l Computer 

Software and Applications Conf., Oct. 
31-Nov. 2, Chicago. Contact Ifay F. Chang, 
Rm. 1B28, IBM T.J. Watson Research Center, 
PO Box 714, Yorktown Heights, NY 10595, 
phone (914) 789-7825. 


November 1990 


14th SCAMC, 1990 Symp. on Computer Ap¬ 
plications in Medical Care, Nov. 4-7, Wash¬ 
ington, DC. Cosponsors: George Washington 
Univ. Medical Center et al. Contact SCAMC — 
Office of CEM, George Washington Univ. 
Medical Center, 2300 K St. NW, Washington, 
DC 20037, phone (202) 994-8928. 


(r|j\ 1990 IFIP-IEEE Int’l Workshop on 
Defect and Fault Tolerance in VLSI 
Systems, Nov. 5-7, Grenoble, France. Contact 
Gabriel Saucier, Inst. National Polytechnique 
de Grenoble/CSI, 46 avenue Felix-Viallet, 
38031 Grenoble Cedex, France, phone (33) 76- 
57-46-87, fax (33) 76-50-23-21; or Tulin E. 
Mangir, TRW, 1 Space Park, R2/2036, Re¬ 
dondo Beach, CA 90278, phone (213) 813- 
3894, fax (213) 813-3709. 


24th Asilomar Conf. on Signals, Systems, 
and Computers, Nov. 5-7, Pacific Grove, 
Calif. Sponsors: Naval Postgraduate School et 
al. Contact George M. Dillard, Naval Ocean 
Systems Center, San Diego, CA 92152-5000, 
phone (619) 553-2478. 

ICCS 90, Int’l Conf. on Communication Sys¬ 
tems, Nov. 5-9, Singapore. Cosponsors: IEEE 
Singapore Section et al. Contact ICCS 90, c/o 
Meeting Planners Pte. Ltd., 100 Beach Rd. 
#33-01, Shaw Towers, Singapore 0718. 

Second SIAM Conf. on Linear Algebra in 
Signals, Systems, and Control, Nov. 5-9, San 
Francisco, Calif. Contact Society for Indus¬ 
trial and Applied Mathematics, 3600 Univer¬ 
sity City Science Center, Philadelphia, PA 
19104-2688, phone (215) 382-9800, fax (215) 
386-7999, e-mail siam@wharton.upenn.edu. 

ICCC 90,10th Int’l Conf. on Computer 
Communication, Nov. 5-9, New Delhi, India. 
Sponsor: Int’l Council on Computer Commu¬ 
nication. Contact Saroj Chowla or P.P. Gupta, 
ICCC 90, CMC Ltd., A-5 Ring Rd., South Ex¬ 
tension Part I, New Delhi 110 049, India, phone 
91 (11) 626-807, fax 91 (11) 684-4652. 


Intelligent Robotic Systems: Design and 
Applications, Nov. 6-7, Philadelphia. Spon¬ 
sor: SPIE. Contact Mohan M. Trivedi, Univ. of 
Tennessee, Electrical and Computer Engineer¬ 
ing, Ferris Hall, Knoxville, TN 37996-2100, 
phone (615) 974-5450. 


TAI 90, Second Computer Society 
^57 Int’l Conf. on Tools for Artificial Intel¬ 
ligence, Nov. 6-9, Washington, DC. Cospon¬ 
sors: Rutgers Univ. et al. Contact Nikolas G. 
Bourbakis, George Mason Univ., ECE Dept., 
Fairfax, VA 22030, phone (703) 425-3930. 

i£3^v IEEE Workshop on the Management 
\g?' of Replicated Data, Nov. 7-9, Houston. 
Sponsor: IEEE Computer Society Technical 
Committee on Operating Systems. Contact 
Jehan-Francois Paris, Computer Science 
Dept., Univ. of Houston, Houston, TX 77204- 
3475, phone (713) 749-3943, e-mail paris@cs. 
uh.edu; or Luis-Felipe Cabrera, IBM Almaden 
Research Center, 650 Harry Rd., MC K55/803, 
San Jose, CA 95120, phone (408) 927-1838. 


1990 IEEE Workshop on VLSI Signal Pro¬ 
cessing, Nov. 7-9, San Diego, Calif. Contact 
Patti Fenstermacher, AT&T Bell Labs, 1243 S. 
Cedar Crest Blvd., Allentown, PA 18103, e- 
mail psf@aloft.att.com; or Howard S. Mosco- 
vitz, AT&T Bell Labs, 1243 S. Cedar Crest 
Blvd., Allentown, PA 18103, e-mail mosc@ 
aloft.att.com. 


Int’l Workshop on Network and Operating 
System Support for Digital Audio and 
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Video, Nov. 8-9, Berkeley, Calif. Sponsor: 

Int’l Computer Science Inst. Contact Ramesh 
Govindan, ICSI, 1947 Center St., Suite 600, 
Berkeley, CA 94704-1105, phone (415) 642- 
4274, ext. 136, e-mail av-workshop!® 
berkeley.edu. 

Computational Science in Industry and the 
Comprehensive Univ., Nov. 8-10, Pomona, 
Calif. Sponsor: Calif. State Polytechnic Univ. 
at Pomona. Contact Bruce P. Hillam, Com¬ 
puter Science Dept., Calif. State Polytechnic 
Univ., 3801 W. Temple Ave., Pomona, CA 
91768, phone (714) 869-3440. 

Fourth Southeastern Small-College Com¬ 
puting Conf., Nov. 9-10, Hickory, N.C. Spon¬ 
sor: Consortium for Computing in Small Col¬ 
leges. Contact Susan Dean, Samford Univ., 

800 Lakeshore Dr., Birmingham, AL 35229, 
Bitnet stdean@samford.bitnet. 

ICCAD 90, IEEE Int’l Conf. on Com- 
puter-Aided Design, Nov. 11-15, Santa 
Clara, Calif. Cosponsor: IEEE Circuits and 
Systems Society. Contact Pat Pistilli, MP As¬ 
sociates, 7490 Clubhouse Rd., Suite 102, 
Boulder, CO 80301, phone (303) 530-4562 or 
4333. 

Vision 90, Nov. 12-15, Detroit. Cosponsors: 
Society of Manufacturing Engineers and SME 
Machine Vision Assoc. Contact Lisa Macha- 
cki, Vision 90, SME Conf. Dept., PO Box 930, 
Dearborn, MI, phone (313) 271-1500, ext. 369. 

Supercomputing 90, Nov. 12-16, New 

York City. Cosponsor: ACM. Contact 
Joanne L. Martin, IBM T.J. Watson Research 
Center, PO Box 218, Route 134, Yorktown 
Heights, NY 10698, phone (914) 945-3285, e- 
mail jlmart@ibm.com; or Supercomputing 90, 
IEEE Computer Society, 1730 Massachusetts 
Ave. NW, Washington, DC 20036-1903, 
phone (202) 371-1013. 

Seventh Governor’s Symp. on High 
Technology, Nov. 13-15, Kauai, Ha¬ 
waii. Sponsor: State of Hawaii. Contact Wil¬ 
liam M. Ball, State of Hawaii, 300 Kahelu St., 
Suite 35, Mililani, HI 96789, phone (808) 625- 
5293. 

Fall Comdex, Nov. 13-17, Las Vegas. Contact 
Interface Group, 300 First Ave., Needham, 

MA 02194, phone (617) 449-6600. 

^ PRICAI 90, Pacific Rim Int’l Conf. on 
Artificial Intelligence 90, Nov. 14-16, 
Nagoya-shi, Aichi, Japan. Sponsor: Japanese 
Society for Artificial Intelligence et al. Con¬ 
tact Teruo Fukumura, Inter Group Corp., Aka- 
saka Yamakatsu Bldg., 8-5-32 Akasaka, Mina- 
to-ku, Tokyo 107, Japan, phone (03) 479-5535. 

14th Western Educational Computing 
Conf., Nov. 15-16, Irvine, Calif. Sponsor: 
California Educational Computing Consor¬ 
tium. Contact Oliver Seely, Jr., California 
State Univ. at Dominguez Hills, Chemistry, 
1000 E. Victoria St., Carson, CA 90747. 

AIDA 90, Sixth Conf. on Artificial Intelli¬ 
gence and Ada, Nov. 15-16, Reston, Va. Spon¬ 
sors: George Mason Univ. et al. Contact AIDA 
90, Computer Science Dept., George Mason 


Univ., 4400 University Dr., Fairfax, VA 
22030, phone (703) 323-2713, fax (703) 323- 
2630, e-mail aida@gmuvax.gmu.edu. 

Cognitiva 90, Nov. 20-23, Madrid. 
Sponsor: AFCET. Contact Cognitiva 90, 
c/o Assoc. Francaise pour la Cybemetique 
Economique et Technique, 156 Bd. Pereire, 
75017 Paris, France, phone 33 (1) 4766-2419, 
fax 33 (1) 4267-9312. 

Al 90, Australian Joint Artificial Intelli¬ 
gence Conf., Nov. 21-23, Perth, Western Aus¬ 
tralia. Sponsor: Australian Computer Society. 
Contact Les Kitchen, Univ. of Western Austra¬ 
lia, Computer Science Dept., Nedlands, West¬ 
ern Australia, 6009, phone 61 (9) 380-2281, e- 
mail ai90@wacsvax.oz.au. 

j£3^| IEEE 1990 Conf. on Software Mainte- 
nance, Nov. 26-29, San Diego, Calif. 
Contact Thomas M. Pigoski, USN, NSGD 
Pensacola, Corry Station, Pensacola, FL 
32511, phone (904) 452-6399. 

NIPS 90, IEEE Conf. on Neural Information 
Processing Systems, Nov. 26-29, Denver, 
Colo. Contact Kathie Hibbard, Engineering 
Center, Univ. of Colorado, Campus Box 425, 
Boulder, CO 80309-0425. 

Micro 23, 23rd Symp. and Workshop 
vftY on Microprogramming and Micro¬ 
architecture, Nov. 27-29, Orlando, Fla. Co¬ 
sponsor: ACM. Contact Chris Papachristou, 
Case Western Reserve Univ., Computer Engi¬ 
neering and Science Dept., Cleveland, OH 
44106, phone (216) 368-5277, e-mail cap@ 
alpha.ces.cwru.edu. 

Iecon 90, 16th Conf. of the IEEE Industrial 
Electronics Society, Nov. 27-30, Pacific 
Grove, Calif. Contact Robert Begun, 23609 
Skyview Terr., Los Gatos, CA 95030, phone 
(408) 353-1560. 

IAPR Workshop on Machine Vision Appli¬ 
cations, Nov. 28-30, Tokyo. Sponsor: Int’l As¬ 
soc. for Pattern Recognition. Contact Mikio 
Takagi, Inst, of Industrial Science, Univ. of 
Tokyo, 7-22-1 Roppongi, Minatoku, Tokyo 
106, Japan, phone 81 (3) 479-0289, fax 81 (3) 
423-2834, e-mail takagi@tkl.iis.u-tokyo.ac. 
J'P- 


December 1990 

First Int’l Symp. on Uncertainty and 
Analysis: Fuzzy Reasoning, Probabil¬ 
istic Methods, and Risk Management, Dec. 
3-5, College Park, Md. Sponsors: Univ. of 
Maryland et al. Contact Bilal M. Ayyub, Civil 
Engineering Dept., Univ. of Maryland, Col¬ 
lege Park, MD 20742. 

ACM SIGSoft 90, Fourth Symp. on Software 
Development Environments, Dec. 3-5, Ir¬ 
vine, Calif. Sponsor: ACM. Contact Dewayne 
E. Perry, AT&T Bell Labs, 600 Mountain Ave., 
Murray Hill, NJ 07974, phone (201) 582-2529. 

ICCV 90, Third Int’l Conf. on Com- 
\g& puter Vision, Dec. 4-7, Osaka, Japan. 


Contact ICCV 90, IEEE Computer Society, 
1730 Massachusetts Ave. NW, Washington, 
DC, 20036-1903, phone (202) 371-1013. 

11th Real-Time Systems Symp., Dec. 
5-7, Orlando, Fla. Sponsor: IEEE Com¬ 
puter Society Technical Committee on Real- 
Time Computing. Contact Doug Locke, IBM 
— MS 409, Systems Integration Div., 6600 
Rockledge Dr., Bethesda, MD 20817, phone 
(301) 493-1496, e-mail cdl@cs.cmu.edu. 

CASE 90, Fourth Int’l Workshop on 
Computer-Aided Software Engineer¬ 
ing, Dec. 5-8, Irvine, Calif. Contact Elliott J. 
Chikofsky, Radius Systems, 75 Lexington St., 
Burlington, MA 01803, phone (617) 494- 
8200. 

WSC 90, 1990 Winter Simulation 
Conf., Dec. 9-12, New Orleans. Contact 
Randall P. Sadowski, Systems Modeling 
Corp., 504 Beaver St., Sewickley, PA 15143, 
phone (412) 741-3727. 

San Diego Workshop on Volume Visu- 
alization, Dec. 10-12, La Jolla, Calif. 
Cosponsor: ACM. Contact T. Todd Elvins, 
SDSC, Box 85608, San Diego, CA 92038, 
phone (619) 534-5128. 

|£3^j Second IEEE Symp. on Parallel and 
Distributed Processing, Dec. 10-12, 

Dallas. Cosponsor: Dallas Chapter of the IEEE 
Computer Society. Contact Behrooz Shirazi, 
Computer Science Dept., Southern Methodist 
Univ., 6425 Airline Rd., Dallas, TX 75205- 
2337, phone (214) 692-2874, e-mail shirazi% 
smu.uucp@uunet.uu.net. 

1990 IEEE Workshop on Languages 
and Architectures for Automation, 
Dec. 19-21, Honolulu, Hawaii. Sponsors: Pa¬ 
cific Int’l Center for High Technology Re¬ 
search et al. Contact D.Y.Y. Yun, Univ. of Ha¬ 
waii, 711 Kapiolani Blvd., Suite 200, Honolu¬ 
lu, HI 96813-5249, phone (808) 539-1532, fax 
(808) 941-1399; or Shi-Kuo Chang, 322 Alum¬ 
ni Hall, Univ. of Pittsburgh, Pittsburgh, PA 
15260, phone (412) 624-8493, fax (412) 624- 
8465, e-mail chang@vax.cs.pitt.edu. 


January 1991 

Fourth CSI/IEEE Int’l Symp. on VLSI 
Design, Jan. 5-8, New Delhi. Sponsors: 
Computer Society of India et al. Contact Yash- 
want K. Malaiya, Computer Science Dept., 
Colorado State Univ., Fort Collins, CO 80523, 
phone (303) 491-7031, fax (303) 491-2293, e- 
mail malaiya@ravi.cs.colostate.edu; or D. 
Roy Chowdhury, Gateway Design Automa¬ 
tion, SDF#A-1, Noida Export Processing 
Zone, PO NEPZ, Noida 201305, India, phone 
91 (05736) 62342, fax 91 (05736) 62343. 

/£|^v Int’l Conf. on Multimedia Informa¬ 
nt^ tion Systems, Jan. 16-18, Singapore. 
Contact Juzar Motiwalla, Inst, of Systems Sci¬ 
ence, Nat’l Univ. of Singapore, Heng Mui 
Keng Terr., Kent Ridge, Singapore 0511, 
phone (65) 772-2075. 
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® PADS, Workshop on Parallel and Dis¬ 
tributed Simulation, Jan. 21-23, 

Anaheim, Calif. Cosponsors: ACM, SCS. Con¬ 
tact David M. Nicol, Computer Science Dept., 
College of William and Mary, Williamsburg, 
VA 23185, phone (804) 221-3458, e-mail 
nicol@cs.wm.edu. 

1^1 IEEE Int’l Conf. on Wafer Scale Inte- 
gration, Jan. 29-31, San Francisco, 
Calif. Cosponsors: IEEE Components, Hy¬ 
brids, and Manufacturing Technology Soci¬ 
ety. Contact Terry Chappell, 730 Encino Dr., 
Aptos, CA 95003, phone (408) 662-1936; or R. 
Mike Lea, Brunei Univ., Uxbridge UB8 3PH, 
UK, phone (44) 895-74000, ext. 2821, fax (44) 
895-58728, e-mail mike.lea@brunel.ac.uk. 


February 1991 

CAIA 91, Seventh IEEE Conf. on Arti- 
ficial Intelligence Applications, Feb. 
24-28, Miami Beach. Fla. Contact IEEE Com¬ 
puter Society, 1730 Massachusetts Ave. NW, 
Washington, DC 20036-1903, phone (202) 
371-1013. 

EDAC 91, European Design Automa- 
tion Conf., Feb. 25-28, Amsterdam. 
Sponsor: IEE. Contact Secretariat, EDAC 91, 
CEP Consultants, 26-28 Albany St., Edin¬ 
burgh EH1 3QH, Scotland, phone 44 (31) 557- 
2478, fax 44 (31) 557-5749. 

/gfj\ Compcon Spring 91, Feb. 25-Mar. 1, 

San Francisco. Contact Compcon Spring 
91, IEEE Computer Society, 1730 Massachu¬ 
setts Ave. NW, Washington, DC 20036-1903, 
phone (202) 371-1013. 


March 1991 

Fifth Int'l Workshop on High-Level 
Synthesis, Mar. 3-6, Buhlerhohe, West 
Germany. Cosponsors: IEEE et al. Contact 
Raul Camposano, IBM T.J. Watson Research 
Center, PO Box 218, Yorktown Heights, NY 
10598, phone (914) 945-3871, e-mail raulc@ 


April 1991 


Second Int’l Symp. on Database Sys- 
vfty terns for Advanced Applications, Apr. 
2-4, Tokyo. Sponsor: Information Processing 
Society of Japan. Contact Yahiko Kambaya- 
shi, Computer Science Dept., Kyushu Univ., 6- 
10-1 Hakozaki, Higashi Fukuoka 812, Japan, 
phone 81 (92) 641-1101, ext. 5407; or Yoshifu- 
mi Masunaga, Univ. of Library and Informa¬ 
tion Science, 1-2 Kasuga, Tsukuba, Ibaraki 
305, Japan, phone 81 (298) 52-0511, ext. 340, 
fax 81 (298) 52-4326, e-mail masunaga@ulis. 
ac.jp. 

tgfjl IEEE Infocom 91, Conf. on Computer 
Communications, Apr. 7-11, Miami, 
Fla. Cosponsors: IEEE Computer and Commu¬ 


nications Societies. Contact N. Shacham, 

IEEE Infocom 91, SRI Int’l, 333 Ravenswood 
Ave., Menlo Park, CA 94025, phone (415) 859- 
5710, e-mail shacham@sri.com. 

First IEEE Int’l Workshop on Inter¬ 
val operability in Multidatabase Systems, 
Apr. 8-9, Kyoto, Japan. Contact Ahmed K. El- 
magarmid, Purdue Univ., Computer Sciences 
Dept., West Lafayette, IN 47907, phone (317) 
494-1998; or Yutaka Matsushita, Instrumenta¬ 
tion Dept., Keio Univ., Hiyoshi, Yokohama, 
Japan, phone 81 (44) 63-1141, ext. 3564. 

/gj\ ASPLOS 4, Fourth Int’l Conf. on 

Architectural Support for Program¬ 
ming Languages and Operating Systems, 
Apr. 8-11, Santa Clara, Calif. Sponsor: ACM. 
Contact Bob Rau, Hewlett-Packard Labs, 1501 
Page Mill, Bldg. 3U, Palo Alto, CA 94304, fax 
(415) 857-8558, e-mail rau@hplabs.hp.com. 

Seventh Int’l Conf. on Data Engineer- 

ing, Apr. 8-12, Kobe, Japan. Contact 
Ming T. (Mike) Liu, Computer and Informa¬ 
tion Science Dept., Ohio State Univ., 2036 Neil 
Ave., Columbus, OH 43210-1277, phone 
(614) 292-1837, e-mail liu@cis.ircc.ohio- 
state.edu; or Data Engineering 91, IEEE Com¬ 
puter Society, 1730 Massachusetts Ave. NW, 
Washington, DC 20036-1903, phone (202) 
371-1013, fax (202) 728-0884. 


ETC 91, 1991 European Test Conf., 
Apr. 17-19, Munich, West Germany. 
Sponsor: VDE (Zentralstelle Tagungen und 
Seminare). Contact Peter Stilke, VDE, Strese- 
mannallee 15, D-6000 Frankfurt 70, FRG, 
phone (69) 6308-203, fax (69) 6308-273. 

CHI 91, 1991 Conf. on Human Factors 
in Computing Systems, Apr. 27-May 2, 
New Orleans. Sponsor: ACM. Contact Keith 
Butler, Boeing, Advanced Technology Center, 
PO Box 24346, M/S 7L-64, Seattle, WA 98124, 
phone (206) 865-3389; or June Davis, 13 An¬ 
napolis St., Annapolis, MD 21401, phone 
(301) 269-6801. 


May 1991 

ICSE 13, 13th Int’l Conf. on Software 
Engineering, May 13-16, Austin, 

Texas. Cosponsor: ACM. Contact ICSE 13, 
Bryan Fugate, MCC, 3500 W. Balcones Center 
Dr., Austin, TX 78759-6509, phone (512) 338- 
3330; MCC, PO Box 200015, Austin, TX 
78720-0015; or ICSE 13, IEEE Computer So¬ 
ciety, 1730 Massachusetts Ave. NW, Wash¬ 
ington, DC 20036-1903, phone (202) 371- 
1013. 

CompEuro 91, IEEE Int’l Conf. on 
vap' Advanced Computer Technology, Re¬ 
liable Systems, and Applications, May 13- 

17, Bologna, Italy. Cosponsors: IEEE Region 8 
et al. Contact Vito Monaco, Dip. Eletronica In- 
formatica E Sistemistica, Univ. Di Bologna, 
Viale Risorgimento, 1-60136, Bologna, Italy. 

dfjjh CCW 91, Third IEEE Conf. on Com- 
puter Workstations, May 15-17, Cape 


Cod, Mass. Sponsor: IEEE Computer Society 
Technical Committee on Operating Systems. 
Contact Luis-Felipe Cabrera, IBM Almaden 
Research Center, MC K55/801, 650 Harry Rd„ 
San Jose, CA 95120-6099, phone (408) 927- 
1838, e-mail cabrera@ibm.com; or Kenneth 
Kane, Boston Development Center, Sun Mi¬ 
crosystems, 2 Federal St., Billerica, MA 
01802, phone (508) 671-0367, e-mail kkane@ 


ICDCS 91, 11th Int’l Conf. on Distrib- 
uted Computing Systems, May 20-24, 

Arlington, Texas. Contact Bill D. Carroll, 
Computer Science Dept., Engineering, Univ. 
of Texas at Arlington, Box 19015, Arlington, 
TX 76019-0015, phone (817) 273-3785, e- 
mail carroll@evax.ari.utexas.edu. 

SESAW, Software Engineering Stan- 
dard Application Workshop, May 20- 

24, San Diego, Calif. Contact Vera Edelstein, 
Nynex, 500 Westchester Ave., White Plains, 
NY 10604, phone (914) 683-2888. 

ISCA 18, 18th Int’l Symp. on 
Computer Architecture, May 26-30, 

Toronto, Canada. Cosponsor: ACM. Contact 
K.C. Smith, Univ. of Toronto, Electrical Engi¬ 
neering Dept., Toronto, Ont. M5S 1A4, Can¬ 
ada, phone (416) 978-5033. 


June 1991 


Fourth Int’l Conf. on Industrial and 
Engineering Applications of Artificial 
Intelligence and Expert Systems, June 2-5, 

Kauai, Hawaii. Sponsors: ACM et al. Contact 
Moonis Ali, Univ. of Tennessee Space Inst., 

MS 15, B.H. Goethert Pkwy., Tullahoma, TN 
37388-8897, phone (615) 455-0631, ext. 236, 
fax (615) 454-2354, e-mail alif@utsivl.bitnet. 

IEEE Computer Society Conf. on 
Computer /Vision and Pattern Recog¬ 
nition, June 3-7, Maui, Hawaii. Contact Shah- 
riar Negahdaripour, Electrical Engineering 
Dept., Univ. of Hawaii, Honolulu, HI 96822, e- 
mail shahriar@ wiliki .eng.hawai i .edu. 

/£3jS SCM 3, Third Int’l Software Configu- 
ration Management Workshop, June 
12-14, Trondheim, Norway. Cosponsors: 
ACM, et al. Contact Reidar Conradi, Computer 
Systems and Telematics Div., Norwegian Inst, 
of Technology, N-7034 Trondheim, Norway, 
phone 47 (7) 593-444; or Peter Feiler, Software 
Engineering Inst., Carnegie Mellon Univ., 
Pittsburgh, PA 15213-3890, phone (412) 268- 
7790, e-mail phf@sei.cmu.edu. 

DAC 91, 28th ACM/IEEE Design 
Automation Conf., June 16-21, Orlan¬ 
do, Fla. Cosponsor: ACM. Contact Pat Pistilli, 
MP Associates, 7490 Clubhouse Rd., Suite 
102, Boulder, CO 80301, phone (303) 530- 
4333. 

/£j^| 10th Symp. on Computer Arithmetic, 
June 26-28, Grenoble, France. Cospon¬ 
sors: ACM et al. Contact Jean-Michel Muller, 
Lab. LIP-IMAC, Ens. Lyon, 69364 Lyon 
Cedex 07, France, phone 33 (72) 72-8229. 
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IEEE COMPUTER SOCIETY 
Membership / Subscription Application 



BENEFITS 



Computer 

Computer comes automatically 
with membership. Written, 
reviewed, and refereed by 
experts, it features survey and 
tutorial articles covering the 
entire computer field, and 
departments such as new 
products, new product reviews, 
Standards, and a reader forum 
called "The Open Channel." 
(monthly). 


Technical Committees 

Participate in one or more of our 33 technical 
committees — networks of professionals with common 
interests in specialty areas within computer hardware, 
software, and applications. 

Standards Working Groups 
Participate in the development of the more than 100 
standards projects currently sponsored by the society 
in such diverse areas as software engineering, local 
area networks, microprocessor buses, design automa¬ 
tion, programming languages, and standards 
definitions, 

Computer Society Press Books 

Receive discounts of up to 50% on over 600 titles 
covering a broad spectrum of computer science topics 
such as networking, communications, advanced 
systems, image processing, security, artificial 
intelligence, and design automation. Over 60 new titles 
are published annually. 

Conferences and Tutorials 
Choose from more than 100 conferences annually, 
ranging from large industry-oriented conferences 
replete with exhibits to small, highly interactive 
workshops. Members receive special low rates. 


Schedule of Fees 


To join: see item 1, 2, or 3. 

To subscribe: see item 4. 

Membership dues and periodical subscriptions are annualized to, and expire on, 
December 31. Choose full- or half-year rate schedules depending on date of 
receipt by the Computer Society as indicated below. Half Year Full Year 
Mar IWug 31 Sept 1-Feb 28 


I don’t belong to the IEEE and I want 
to join just the Computer Society 


□ $23.50 □ $47.00 


) I don’t belong to the IEEE and I want 
■ to join both the Computer Society and the IEEE* 

I reside in Region 1-6 (United States). □ $47.50 □ $95.00 

I reside in Region 7 (Canada). □ $43.50 □ $87.00 

I reside in Region 8 (Europe, Africa, or the Middle East) □ $43.00 □$86.00 

I reside in Region 9 (Latin America). □ $39.50 □ $79.00 

I reside in Region 10 (Asia and Pacific). □$38.50 □ $77.00 

id the Computer Society may deduct $5 off the 


( I already belong to the IEEE and I want 
to join the Computer Society. 

IEEE Member Number_ 


□ $9.00 □$18.00 


^ OPTIONAL PERIODICALS for new or current members 

IEEE Computer Graphics and Applications (3061) 6 □ $10.00 
IEEE Design and Test (3111) 

IEEE Expert (3151) . 

IEEE Micro (3071) . 

IEEE Software (3121) . 

Transactions on Computers (1161) . 

Transactions on Knowledge and 

Data Engineering (1471) . 

Transactions on Parallel and 

Distributed Systems (1501) . 

Transactions on Pattern Anaysis and 


□ $10.00 

□ $20.00 

□ $10.50 

□ $21.00 

□ $ 9.00 

□ $18.00 

□ $ 9.50 

□ $19.00 

□ $10.00 

□ $20.00 

□ $10.00 

□ $20.00 

□ $ 5.00 

□ $10.00 

□ $ 5.50 

□ $11.00 

□ $10.00 

□ $20.00 

□ $10.00 

$ 

□ $20.00 


h, German, Swiss, Japanese, o 


PRICES EXPIRE 12/31/90 


ia □ Master Card □ American Express □ Eurocard 


Charge Card Number 


U 


te governed by IEEE's and the society’s constitutions, bylaws/ and statements of 


EDUCATION (highest level co 


Return to: IEEE Computer Society, 10662 Los Vaqueros Circle, P.O. Box 3014 Los Alamitos, CA 90720-1264 USA. pco 

Residents of Europe mail to: IEEE Computer Society, 13, Avenue de I’Aquilon, B-1200, Brussels, BELGIUM. 

Asian / Pacific residents mail to: IEEE Computer Society, Ooshima Building, 2-19-1 Minami-Aoyama, Minato-ku, Tokyo 107 JAPAN. 








































CAREER OPPORTUNITIES 


RATES: $12.00 per line, (ten lines mini¬ 
mum). Average five typeset words per 
line, eight lines per column inch. Add 
$10 for box number. Send copy at least 
one month prior to publication date to: 
Marian B. Tibayan, Classified Adver¬ 
tising, COMPUTER Magazine, 10662 
Los Vaqueros Circle, PO Box 3014, 
Los Alamitos, CA 90720-1264; (714) 
821-8380; fax (714) 821-4010. 


DESIGN ENGINEER 

Design Engineer - Responsible for the cre¬ 
ation, design and development of high per¬ 
formance raster graphic controllers. A labo¬ 
ratory course utilizing bit slice hardware and 
firmware. One year experience in micropro¬ 
cessor and graphics system design, program¬ 
mable logic and high speed design is required. 
A Bachelor of Science degree in Electrical 
Engineering. 40 hours per week, and 
$721.15 per week. Apply at the Texas Em¬ 
ployment Commission, Houston, Texas, 
J.O. *5424476 an ad paid by an equal 
employment opportunity employer. 


DEPARTMENT HEAD 
DEPARTMENT OF COMPUTER 
SCIENCE 

Louisiana Tech University 

Nominations and applications are invited 
for the position of Professor and Head of the 
Department pf Computer Science at Louisi¬ 
ana Tech University. Applicants must have a 
Ph.D. in Computer Science or a closely 
related field. In addition, the individual must 
have a superior record in teaching and re¬ 
search at the undergraduate and graduate 
level. The position will be a twelve month 
appointment and offer a competitive salary. 
Candidates must have a demonstrated com¬ 
mitment to excellence in teaching, research, 
and service and the ability to work effectively 
with faculty, students, and administration. 

The Department of Computer Science at 
Louisiana Tech University has seven (7) 
faculty and 125 students. The Department 
offers a CSAB accredited B.S. degree, the 
M.S. degree, and is currently considering 
joint establishment of an interdisciplinary 
Ph.D. program. The Computer Science De¬ 
partment features a departmental Ethernet 
linking faculty offices and laboratories. The 
Department computer equipment currently 
includes 10 Sun workstations with a file 
server, 2 microvaxes, and IBM RT, and nu¬ 
merous PC’s and peripherals. 

Louisiana Tech University is a state-sup¬ 
ported institution with over 10,000 students 
enrolled in six colleges. The College of Engi¬ 
neering is composed of 7 academic depart¬ 


ments with a total enrollment of about 1500 
students. Computer Science at Louisiana 
Tech is housed in the College of Engineering 
and enjoys close relationships with numer¬ 
ous faculty members and departments on 
joint ventures. Louisiana Tech University is 
located in Ruston, situated in beautiful North 
Louisiana. The community is warm and vital 
with a moderate cost of living. 

Louisiana Tech is expanding its research 
and graduate programs while maintaining 
and enhancing its reputation for excellence 
in teaching. This is an exciting time in Loui¬ 
siana Tech’s history, and excellent opportu¬ 
nities exist for professional growth and 
development. If you want to make a signifi¬ 
cant impact, please apply. If you know 
others who fit this description, please nomi- 

Applicants should submit resumes, includ¬ 
ing a list of three references to: 

Barry A. Benedict, Dean 
College of Engineering 
Louisiana Tech University 
P.O. Box 10348 
Ruston, LA 71272 

Nominations can be sent to the same ad¬ 
dress, and nominees will be contacted for 
needed information. The review of applica¬ 
tions will begin immediately and will con¬ 
tinue until the position is filled. 

Louisiana Tech University is an equal op¬ 
portunity university. Women and minorities 
are encouraged to apply. 


SYSTEMS ENGINEER 

Central Ohio Computer & Data Process¬ 
ing Consultant firm seeking Systems Engi¬ 
neer to design, develop, test and implement 
new features and methodology for an intelli¬ 
gent data base that controls and operates 
International Networks for overseas tele¬ 
communication systems. Must analyze re¬ 
quirements for enhancements to the system, 
evaluate software design; define, design, im¬ 
plement and test modifications. Must docu¬ 
ment modifications and coordinate efforts of 
other engineers on project developments. 
Additional responsibilities include design, 
development, implementation and testing of 
corrections, modifications, improvements, 
to previously released software systems. 
Requires M.S. in Computer Science and 
knowledge of UNIX (tm) operating system, 
C programming language, software design, 
database management, compiler construc¬ 
tion, and distributed systems as demon¬ 
strated by not less than 9 semester hours of 
study in each topic. Will accept applicant 
who has substantially completed masters de¬ 
gree. No exp. required. 40 hrs/wk, 7:30- 
4:30, Mon-Fri., $35,000/yr. Must have 
proof of legal authority to work permanently 
in U.S. Send resume in duplicate (no calls) 
to J. Davies, JO* 1219734 Ohio Bureau of 
Employment Services, P.O. Box 1618, Co¬ 
lumbus, OH 43216. 


INFORMATION SCIENTIST 

Duties: research, develop & maintain 
hospital information system. Req: MS in 
Medical Informatics (or equivalent), and 1 
yr exp. using the HELP Information Sys¬ 
tem. Permanent, full-time, 40 hours per 
week. Salary: $27,500.00 per year. Send 
resume to Utah Job Service, P.O. Box 
117750, Salt Lake City, Utah 84147, Job 
Order *3936378. 


SENIOR SYSTEMS ANALYST 

Central Ohio Computer Research firm 
seeking a senior systems analyst for re¬ 
search, design and development for evolu¬ 
tion of Knowledge Integration Shell (KI- 
Shell) on parallel platforms. KI Shell is a pro¬ 
prietary software product to provide process¬ 
modelling; user-interface specification, infor¬ 
mation modelling, storage and retrieval; and 
application integration. Direct responsibility 
for Kl-Shell on UNIX (tm) platform and pro¬ 
gramming and system support for applica¬ 
tions on IBM/VM-370, VAX/VMS IBM-PC 
and Apple Macintosh systems. Responsible 
for direct contact with customers, determina¬ 
tion of customer requirements, and design 
system to meet specifications. Ph.D. re¬ 
quired. Expertise in programming environ¬ 
ments; parallel programming and parallel 
operating systems; and information model¬ 
ing as demonstrated by one refereed publi¬ 
cation in each area of expertise. No exp. re¬ 
quired. 40 hrs/wk, 8-5Mon.-Fri., $52,000/ 
yr. Must have proof of legal authority to work 
permanently in U.S. Send resume in dupli¬ 
cate (no calls) to J. Davies, Job No. 
1219642, Ohio Bureau of Employment Ser¬ 
vices, P.O. Box 1618, Columbus, OH 
43216. 


SOFTWARE DESIGN ENGINEER 

Central Ohio Computer & Data Process¬ 
ing Consultant firm seeking Software Design 
Engineer to design, execute, and evaluate 
test plans used to verify the correct network 
operation of software systems interfacing 
with real time database on telecommuni¬ 
cations network. Systems have message 
transfer capability with Open System Inter¬ 
face, Layers 1, 2 & 3 and other Signalling 
System 7 features. Analyze efficiency of 
system and identify improvements for test 
environment. Requires B.S. in Computer 
Science and 2V2 years experience of devel¬ 
opment of Common Channel Signalling 
System 7 (SS7) software & system testing. 
Will accept B.S. in Electrical Engineering. 40 
hrs/wk, 7:30-4:30, Mon.-Fri., $45,000/yr. 
Must have proof of legal authority to work 
permanently in U.S. Send resume in dupli¬ 
cate (no calls) to J. Davies, JO* 1219750, 
Ohio Bureau of Employment Services, P.O. 
Box 1618, Columbus, OH 43216. 
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CHAIRMAN, 

ELECTRICAL ENGINEERING 
University of Pittsburgh 

Nominations and applications are invited 
for the position of Professor and Chairman, 
Department of Electrical Engineering at the 
University of Pittsburgh, Candidates should 
possess a vision of the future directions for 
Electrical Engineering and have the leader¬ 
ship and managerial skills to promote and 
implement that vision. The successful candi¬ 
date must have a demonstrated record of ex¬ 
cellence in both teaching and research. An 
earned doctorate is required. 

The Electrical Engineering Department 
has 24 tenure stream faculty members span¬ 
ning the areas of computer engineering, 
electronics, signal processing, communica¬ 
tions, systems, control, and power. The De¬ 
partment offers BS, MS, and Ph.D. degrees, 
and has an enrollment of approximately 340 
undergraduate students (sophomore through 
senior level), and 180 graduate students. 

The University of Pittsburgh, entering its 
third century, has established a strong tradi¬ 
tion of education, research and service and is 
a member of the select American Associa¬ 
tion of Universities. The Electrical Engineer¬ 
ing Department celebrates its centennial an¬ 
niversary in 1990. This position will be filled 
by September 1, 1991. Nominations or ap¬ 
plications, including resume and names and 
telephone numbers of references should be 
sent before October 1, 1990 to: 

Professor Larry J. Shuman 
Associate Dean of Engineering 

Chairman of Electrical Engineering 
Search Commitee 
University of Pittsburgh 
Pittsburgh, PA 15261 
Telephone: 412-624-9814 
The University is an affirmative action 
and equal opportunity employer. 


THE UNIVERSITY OF KENTUCKY 

The University of Kentucky is seeking ap¬ 
plications and/or -nominations to fill an 
endowed chair in the area of Computer En¬ 
gineering. The Robinson Chair in Computer 
Engineering requires a nationally recognized 
scholar and researcher in computer engi¬ 
neering with a Ph.D. in electrical and/or 
computer engineering. The candidate must 
have a record of proven experience which 
will enable him/her to provide the leadership 
necessary to elevate the University of Ken¬ 
tucky to nationally recognized status in this 
area. It is expected that a Center of Excel¬ 
lence in Computer Engineering will develop 
out of these efforts. In order to achieve these 
goals, the appointee will establish and main¬ 
tain a strong program of research, including 
extramural funding, develop and teach com¬ 
puter engineering courses at the graduate 
and undergraduate levels, and provide lead¬ 
ership and mentorship to faculty members 
and graduate students in this area. The posi¬ 
tion carries a salary which is highly competi¬ 
tive with today’s market, six faculty positions 
specifically designated in Computer Engi¬ 
neering, excellent department and university 
computing facilities including a supercompu¬ 
ter and a host of other computers and work 


stations linked by a campus-wide data com¬ 
munications network, necessary laboratory 
space and equipment and above all, strong 
university administrative support for the 
above mentioned goals for Computer Engi¬ 
neering. The Center for Computational Sci¬ 
ence, University Computing Center and 
Computer Science Department are included 
in supporting units with key interest in these 
plans. Applications and/or nominations 
should be sent to Dr. S. A. Nasar, Chair¬ 
man, Department of Electrical Engineering, 
University of Kentucky, Lexington, KY 
40506-0046. Additional information may 
also be obtained by writing to the above ad¬ 
dress or calling at (606) 257-8042. The Uni¬ 
versity of Kentucky is an equal opportunity/ 
affirmative action employer. 


THE ARTIFICIAL INTELLIGENCE 
LABORATORY 

AT MASSACHUSETTS INSTITUTE 
OF TECHNOLOGY 
invites applicants to the position of 
COMPUTER SYSTEMS ENGINEER 
Position # R90-077 

Individual is needed to participate in a re¬ 
search project leading to the design of one of 
the world’s fastest parallel computers. Work 
involves design of chip and board-level hard¬ 
ware and system software as well as system 
performance measurement and evaluation. 
The position involves some coordination of 
other designers. 

Qualifications: BS/MS in Computer 
Science or Electrical Engineering. Experi¬ 
ence with digital system design, systems soft¬ 
ware, and parallel computer systems is 
required. 

Qualified applicants, send resumes and 
names of references to: 

Ms. Marilyn Melithoniotes 
MIT Artificial Intelligence Laboratory 
545 Technology Square, Room 807 
Cambridge, MA 02139 

MIT is an Affirmative Action /Equal 
Opportunity Employer. 

This is a non-smoking environment. 


SOUTHERN METHODIST UNIVERSITY 

Department of Computer Science 
and Engineering 

The Department of Computer Science 
and Engineering (CSE) invites applications 
for faculty positions at both senior and junior 
levels. Applicants for the senior position 
should have an outstanding reputation in 
academic pursuits and a strong publication 
record. Applicants for the junior positions 
should have a Ph.D. degree and demon¬ 
strated research and teaching experience in 
computer science. We are interested in ap¬ 
plicants in the following areas: computer ar¬ 
chitecture, computer systems, computer lan¬ 
guages, and database systems. 

SMU is a private university with approx¬ 
imately 9,000 students. CSE is in the School 
of Engineering and Applied Science, where 
a close working relationship exists with the 
Department of Electrical Engineering. The 
Department has extensive contacts with 


computer-related and engineering-oriented 
industrial firms that distinguish Dallas as one 
of the top centers for high technology. 

CSE presents a balanced program of re¬ 
search and education at all levels and has 
been offering Ph.D. degrees since 1970. 

Applicants should send a complete 
resume, including the names of three refer¬ 
ences to: J.L. Kennington, Chair, Depart¬ 
ment of Computer Science and Engineer¬ 
ing, Southern Methodist University, Dallas, 
Texas 75275-0122. 

SMU is an equal opportunity/affirmative 
action employer. Applications from women 
and minorities are particularly encouraged. 


UNITED STATES AIR FORCE 
ACADEMY 

Department of Computer Science 
Visiting Faculty Position 

The Department of Computer Science of 
the United States Air Force Academy invites 
applications for a Visiting Professor/Associ¬ 
ate Professor/Assistant Professor position. 
We seek qualified applicants with extensive 
experience teaching computer science and a 
record of scholarly activity. Duties will in¬ 
clude teaching undergraduate courses, re¬ 
viewing our academic program, and pro¬ 
moting undergraduate and faculty research. 
Applicant must be a U.S. citizen and should 
have a demonstrated commitment to under¬ 
graduate computer science programs. The 
appointment is normally for one year and 
will begin July 1991. Salary is commensu¬ 
rate with your current salary level. To apply, 
please send your vita by 1 September 1990 
to: Chairman, Department of Cbmputer Sci¬ 
ence, United States Air Force Academy, 
USAF Academy CO 80840. 


SYSTEMS ENGINEER 

Central Ohio Computer & Data Proces¬ 
sing Consultant firm seeking Systems Engi¬ 
neer to review and analyze the requirements 
and architecture of a computer system used 
in telecommunications signalling networks 
and verify its performance in a network en¬ 
vironment. Design and write test plans and 
test procedures; execute tests in an inte¬ 
grated testing laboratory; evaluate and pre¬ 
pare reports on results. Design and develop 
software tools in C programming language 
to simulate traffic workload and analyze 
system functionality. Identify and introduce 
tools to enhance test management and im¬ 
prove testing environment. Requires M.S. in 
Computer Science, knowledge and exper¬ 
tise in systems performance evaluation; 
computer networking; C programming lan¬ 
guage; computer architecture and operating 
systems as demonstrated by not less than six 
(6) hours of course work or research in each 
area of expertise at graduate level. No exp. 
required. 40 hrs/wk, 7:45-4:30, Mon.-Fri., 
$38,000/yr. Must have proof of legal au¬ 
thority to work permanently in U.S. Send 
resume in duplicate (no calls) to L. Eggle¬ 
ston, JO# 1219689, Ohio Bureau of Em¬ 
ployment Services, P.O. Box 1618, Colum¬ 
bus, OH 43216. 
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SOFTWARE ENGINEER 

Company in Massachusetts needs a Soft¬ 
ware Engineer for work in a Central Ohio 
location. Job duties are to design, program 
and maintain software for an “operations 
system” designed to provide end-to-end net¬ 
work surveillance including the monitoring 
and analysis of electronic switches and trans¬ 
mission facilities, of a telephone company. It 
involves software development using C lan¬ 
guage in unix/SunOS environment, for pro¬ 
viding quick and reliable communication 
over a computer network between pro¬ 
cesses, running in a distributed system using 
Sun Workstations, requiring fast data base 
accesses to display critical alarm conditions, 
in Real Time. Job requires a Master of Sci¬ 
ence in Computer Science, degree specializ¬ 
ing in Distributed Real Time Systems and 
Computer Networks. One course in Data 
Base Management Systems, Electrical Engi¬ 
neering or Electronics Graduate Course or 
paper or thesis using C, Unix, SunOS and 
Sun 3/4 Workstations. 40 hours per week, 
8:00 a.m. - 5:00 p.m., $30.30 per hour. 
Send (2) two resumes (NO CALLS) to C. 
Bussard, JO. # 1219678, Ohio Bureau of 
Employment Services, P.O. Box 1618, 
Columbus, Ohio 43216. Must have proof of 
legal authority to work permanently in the 
United States. 


SENIOR RESEARCH ASSISTANT 

The Research Foundation of the State 
University of New York at Plattsburgh seeks 
a Senior Research Scientist for acoustics/ 
vibration research at the Auditory Research 
Laboratories at SUNY Plattsburgh. Appli¬ 
cants must have a Ph.D. in Mechanical or 
Electrical Engineering, experienced in the 
use of FORTRAN and C, graduate courses 
in the areas of Computer-Aided Experimen¬ 
tation, Fourier Transform Applications, 
Computer controlled systems, Adaptive Sig¬ 
nal processing, Stochastic Processes and 
Micro-processor applications. 

Responsibilities include: development of 
proposals for research in auditory signal pro¬ 
cessing directing research activities of the 
acoustics/vibration laboratory including de¬ 
velopment of software/hardware for gener¬ 
ation and measurement of various dynamic 
signals, monitoring/calibration of experi¬ 
mental paradigms used in various animal ex¬ 
periments, selection, design and testing of 
new digital signal generation and analysis 
equipment. 37Vz hour week; $40,000 + 
per year plus benefits depending on the 
availability of funds and qualifications. 

Send resume and two references by July 
15, 1990 to: 

Research Foundation of the 
State University of New York 
Box - R25 

Plattsburgh, New York 12901 

THE RESEARCH FOUNDATION OF 
SUNY IS AN EQUAL OPPORTUNITY/ 
AFFIRMATIVE ACTION EMPLOYER. 


IOWA STATE UNIVERSITY 
Senior Faculty Positions in 

Electrical & Computer Engineering 

The Department of Electrical and Compu¬ 
ter Engineering at Iowa State University 
seeks applicants for faculty positions at the 
associate or full professor level in the depart¬ 
ment of Electrical and Computer Engineer¬ 
ing starting in August 1990. Particular 
interests include computer networks, distrib¬ 
uted computing, data communications. The 
department supports a networking research 
and teaching lab which contains over 
$250,000 of equipment. Supported net¬ 
works include: Starian, Ethernet, Token 
Ring, ISDN with plans to add token bus and 
FDDI. Funded projects include: integrated 
voice/data/voice networks, transport proto¬ 
cols for internetworking, and high-speed 
control bus protocols. Applicants should be 
committed to teaching excellence, extension, 
and a demonstrated ability to obtain peer 
reviewed publications and external funding. 

Interested persons should send a curricu¬ 
lum vita and a list of 3 references to Dr. John 
Lamont, Chairman Faculty Search Commit¬ 
tee, 201 Coover Hall, Iowa State University, 
Ames, IA 50011. Phone 515 294-2663. 
Internet: lamont@isuee 1 .ee. iastate.edu. 

Iowa State University is an Equal Oppor¬ 
tunity/Affirmative Action Employer. 



R esearch and Development in Artificial Intelligence, 
Computer Security, Simulation & Modeling, Image 
Understanding and advanced concepts in Information 
Management and System Engineering carry high prior¬ 
ity at Planning Research Corporation. PRC is a company 
both large enough and progressive enough to assure 
that your efforts and accomplishments will have the 


impact they deserve. And the rewards they deserve - 
exceptional support, both intellectually and financially. 

As our group has expanded, so have the opportunities 
to contribute to the state-of-the-art in both internal and 
external R&D efforts, with real-world applications. 


Our Technology Division has a number of specific 
opportunities for experts in three areas: 

ARTIFICIAL INTELLIGENCE: 

Along with an advanced degree and excellent oral and 
written skills, you should have several years' experi¬ 
ence in: Lisp and/or C; stand-alone or distributed 
knowledge-based systems; uncertainty reasoning 
techniques; image understanding; natural language 
understanding; and/or computer security. 

GEOGRAPHICAL INFORMATION SYSTEMS: 

We are seeking persons with advanced degrees, signifi¬ 
cant experience, and recognition in the fields of Geo- 
based Systems, Optical Disk Image-based Systems, 
and/or very-large supercomputing models, such as in 
Atmospheric Dispersion. 

COMPUTER ARCHITECTURE AND 
ENGINEERING: 

With an MS in Computer Science or equivalent, you 
should have at least 2-4 years' experience in several of 


the following: UNIX and/or C; simulation 
tools; distributed information systems; 
computer/network performance, sizing, 
modeling; computer system architecture; 
communication protocols; and/or database 
technologies including relational, SQL, distrib¬ 
uted designs and specific relational DBMSs. 

These positions may require U.S. citizenship, and 
some positions may require a rigorous background 
investigation. A current clearance is a plus. 

For consideration, please send your resume, in 
confidence, to: Planning Research Corporation, 
Department AL-95,1500 Planning Research Drive, 
McLean, Virginia 22102. An Equal Opportunity Employer 
M/F/H/V. 
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BOOK REVIEWS 


Editor: Guy Johnson, Department of Information Technology, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623. 


Systems Engineering: Architecture and Design 

Walter R. Beam (McGraw-Hill, New York, 1990, 355 pp., $44.95) 


This book deals with system design 
fundamentals and is written primarily for 
system designers. The material is drawn 
from Beam’s notes for a graduate sys- 
tems-engineering course. Consequently, 
the book will also be useful to students 
needing an overview of system design 
and architecture. 

Familiarity with some computer termi¬ 
nology and computer systems would be 
helpful for reading this book, though cer¬ 
tainly not mandatory. No rigorous math 
skills are required. The book will be ap¬ 
preciated most by professional designers 
and system architects because of its fre¬ 
quent treatment of real-world situations, 
such as ambiguous requirements, limited 
budgets, compressed schedules, and the 
trade-off of using standard versus custom 
components. 

I compared this book to the text used 
for my electrical engineering systems 
course. Introductory Systems Engineer¬ 
ing (John G. Truxal, McGraw-Hill, 1972). 
While Truxal’s mathematical approach 
works well for circuit designers. Beam’s 
much less mathematical approach seems 
better suited for the system generalist 
(who is, in fact. Beam’s target reader). 

I was initially disappointed that Beam 
classified software as a subsystem rather 
than as a system, but he eventually con¬ 
verted me by showing the “big picture” of 
system design and the fact that hardware 
considerations must come into play for 
good system design. He certainly does 
not neglect using software examples. 

Beam uses story fragments from real- 
world systems to discuss the designs, 
making the style seem somewhat ram¬ 
bling. But that’s not a problem, since this 
book is best read “a section here and a sec¬ 
tion there” to help the designer focus on 
system design fundamentals. The anec¬ 
dotes range from the building of the pyra¬ 
mids, to mousetrap designs, to a distrib¬ 
uted numerical-control system that the 
author helped design and develop. I found 
this last story interesting because I was 


working on a similar project at the same 
time as the author, and we each imple¬ 
mented significantly different solutions. 

The end of each chapter has a complete 
bibliography and a set of exercises that go 
beyond the usual “right answer, wrong 
answer” approach. For example, one 
question asks the reader to define a suit¬ 
able automotive suspension stress test in¬ 
volving an actual vehicle and to provide a 


OS/2 Database Manager: A Devel¬ 
oper’s Guide is not an introduction; its 
target audience is current OS/2 develop¬ 
ers and database programmers. The book 
provides an overall perspective and the 
technical information necessary to de¬ 
velop a database system effectively in the 
OS/2 environment. Any DB2, SQL/DS, 
or SQL/400 programmer, database ad¬ 
ministrator, or system administrator who 
wants to leam and use the OS/2 database 
manager will find this book a very good 
foundation. 

The book is more than a technical man¬ 
ual. For example, the section on Views 
discusses that program’s uses and limita¬ 
tions, the section on database-manager 
programming demonstrates a variety of 
programs and ways of encoding SQL 
statements in programs, and another sec¬ 
tion discusses aspects of the database 
manager excluded from the product 
manuals. 

The book is informative, interesting, 
and easy to comprehend, and it relies ex¬ 
tensively on figures, tables, diagrams, 
and examples for clarity. However, any 
later editions should put figures and 
tables on the same page as the text that ref- 


rationale for each test criterion and pa¬ 
rameter. 

What was most effective about the 
book, was that it made me take a few steps 
back from the various problems I was 
working on to get a better handle on the 
system aspects of those problems. 

Randall C. Newcomb 

Functional Software 


erences them. New editions should also 
repeat frequently referenced tables or in¬ 
clude them as an appendix, rather than re¬ 
ferring the reader back to previous chap¬ 
ters. One nice idea in the current edition is 
the references to the different versions of 
OS/2 (1.0, 1.1, and 1.2), which will be in¬ 
valuable to any serious practitioner. 

The book packs a lot of information 
into its 22 chapters. Part I gives an over¬ 
view of the database manager and ex¬ 
plains basic concepts underlying the OS/ 
2 database manager. Part II thoroughly 
explores SQL and is certainly easier to 
understand than other SQL books on the 
market. It discusses not only SQL’s capa¬ 
bilities for OS/2 database management 
but also its limitations. 

Part III shows how to program with the 
OS/2 database manager, and Part IV of¬ 
fers two chapters on performance issues 
and the future. The discussion of perfor¬ 
mance issues is good, but the three-page 
chapter on compatibility and the future is 
a disappointing letdown in an otherwise 
very good book. 

George Pasieka 

Technical consultant, Toronto 


OS/2 Database Manager: A Developer’s Guide 

Howard Fosdick (John Wiley and Sons, New York, 1989, 378 pp., $24.95) 
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C++ for C Programmers 

Ira Pohl (Benjamin-Cummings, Redwood City, Calif., 1989, 244 pp., $25.95) 

Using C++ 

Bruce Eckel (McGraw-Hill, New York, 1989, 617 pp., $24.95) 


The C++ programming language is an 
object-oriented superset (actually, al- 
most-superset) of C that is rapidly gain¬ 
ing popularity. For quite a while, the only 
reference was Bjame Stroustrup’s The 
C++ Programming Language (Addison- 
Wesley, 1986). But that book is now out¬ 
dated, and more C++ books are appear¬ 
ing. This review considers two new books 
on C++ that discuss the language’s new¬ 
est version. 

In C++for C Programmers, Pohl as¬ 
sumes the reader is familiar with C, while 
Eckel claims Using C++ requires no pre¬ 
vious knowledge of C. However, Eckel 
does not discuss C-style control struc¬ 
tures, stranding the hapless Pascal pro¬ 
grammer who has never seen a “switch.” 
Also, his treatment of pointers, particu¬ 
larly arrays of pointers and multidimen¬ 
sional arrays, is just too sketchy to be 
accessible to a Pascal or Fortran program¬ 
mer. Nevertheless, it will be a useful re¬ 
fresher for the reader who does have some 
knowledge of C-style pointers. 

Both authors sing the praises of object- 
oriented methodology. In introducing the 
concept of object orientation, Pohl talks 
about objects and inheritance only. There 
is no mention of polymorphism. Eckel’s 
definition, on the other hand, is clear and 
to the point: “An object-oriented lan¬ 
guage supports three key features: ab¬ 
stract data typing, inheritance, and poly¬ 
morphism.” 

Eckel’s introduction is a bit dry be¬ 
cause he gives no examples, possibly be¬ 
cause he doesn’t want to overwhelm the 
reader. This way, the reader has to take it 
on faith that object-oriented program¬ 
ming is a good thing. Pohl’s introduction 
does give a quick tour through the lan¬ 
guage, but what a miserable tour it is. 
There are programs to print “C++ is an 
improved C,” to print the distance to the 
moon in kilometers, to convert any inte¬ 
ger number of miles to its equivalent in 
kilometers, and to compute the average of 
exactly three inputs. There is also a point¬ 
less one-page example of inheritance 
with a base class “transact,” derived 
classes “stock” and “call,” and a virtual 
“profit-loss ( )” function. Since knowl¬ 
edge of C is required for this book, the in¬ 
troduction’s (mostly) poor examples run 
no risk of losing the reader, but they don’t 
advertise the utility of object-oriented 
programming, either. 


The treatment of references is sympto¬ 
matic. Pohl (as did Stroustrup) first intro¬ 
duces the next-to-useless case of a stand¬ 
alone reference variable. Eckel, on the 
other hand, first discusses reference ar¬ 
guments and return values, contrasting 
them with the use of pointers. He then 
notes that references can be declared 
elsewhere but rarely are. He also includes 
a nice discussion of when to use call by 
value, when to use pointers, and when to 
use references. This is stuff a program¬ 
mer can really use. 

Both authors cover the usual ground: 
classes, overloading, inheritance, poly¬ 
morphism, etc. Neither author offers a 
good example to illustrate the mechanics 
of inheritance, resorting to gibberish 
classes such as “X_data” and “derived 1.” 
This is a lousy practice used by too many 
authors. There is nothing wrong with us¬ 
ing an occasional class X — as in “X(const 
X&)” — to illustrate a genuinely general 
point. Using “foo” and “bar” is also okay, 
since they have just about the same status 
as “X” and “Y” by now. But why waste the 
reader’s time with “derivedl” if “dl” 
would suffice? And why use elaborate 
constructs involving playing-card suits 
(or zoo animals, for that matter) when the 
reader has no earthly interest in their im¬ 
plementation? Ideally, code fragments 
should either come from real production 
code or be reasonably extendable to real 
production code. All of the examples in 
The C Programming Language (Prentice 
Hall, second edition, 1989) by Brian Ker- 
nighan and Dennis Ritchie are of this 
kind. 

Pohl is by far the worse offender in this 
area. Incredibly, his chapter on inheri¬ 
tance offers no useful examples of inheri¬ 
tance with polymorphism. There is a hier¬ 
archy 

proc data 

X_data Y_data 
_ \ 

Z_data 

and an aptly named class “bstree,” but 
neither have virtual functions. 

Both books treat C++ version 2 in the 
last chapter only. This is understandable 
but a bit unfortunate, since version 2 will 
soon be the standard. Also, neither author 
gives a reasonable example of multiple 


inheritance. Eckel discusses virtual base 
classes but gives no real examples. Pohl 
doesn’t even mention them. 

Neither book does a good job of pulling 
together dynamic memory management 
techniques. Eckel covers free-store ob¬ 
ject allocation, including construction/ 
destruction and assignments to “this” in 
Chapter 6, but he only mentions copy-ini¬ 
tializers and reference counts in Chapter 
9. Pohl doesn’t mention reference counts 
at all, and assignments to “this” are way 
over his head. Here’s that entire section: 

One plausible use of 'this’ involves writ¬ 
ing constructors and destructors that use 
programmer-designed free store manage¬ 
ment functions. For example: 

class X { 

X(unsigned size) {this = (void*) 
malloc(size);) 

~X() {free(this); this = 0;} 

V 

This is generally done for reasons of effi¬ 
ciency 

Pohl’s book is just too shallow to war¬ 
rant serious consideration. There are few 
interesting examples, and he steers away 
from all complex issues. 

Eckel’s book has problems, too. The 
typeface choices and typesetting are atro¬ 
cious, and there are many typos. Eckel 
also uses inlines heavily in his classes; 
some of them will break a translator that 
doesn’t have an option to turn inlining 
off. Also, at one point he computes the dot 
product of two vectors p.q, represented in 
Cartesian coordinates, as p.length () * 
q.length () * angle (p,q), a staggering 
waste of computing time over p.x * q.x + 
p.y * q.y 

On the other hand, there is much to like 
about Eckel’s book. He makes sound rec¬ 
ommendations for practical program¬ 
ming, and many of his examples are inter¬ 
esting enough to warrant study. Consider 
it, not as a textbook or reference, but as a 
report from someone who has faced the 
challenges of C++ programming and 
lived to tell the tale. 


Cay S. Horstmann 
San Jose State University 


July 1990 
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Managing for Profit in the Semiconductor Industry 

Robert Mclvor (Prentice Hall, Englewood Cliffs, N.J., 1989, 516 pp., $34) 


In this age of “one-minute manage¬ 
ment” fixes and evangelistic business gu¬ 
rus, I would normally be leery of a book 
with a title like Managing for Profit in the 
Semiconductor Industry. However, the 
book lives up to its ambitious title, and au¬ 
thor Robert Mclvor has the practical ex¬ 
perience and knowledge to legitimize his 
opinions and pronouncements. Mclvor is 
employed by Motorola and obviously has 
been involved in their self-examination 
and changes. 

Mclvor shows how to upgrade an es¬ 
tablished corporation into a world-class 
manufacturing concern. His stated goal is 
the creation of a corporation capable of 
sustained competition with the “Five Ti¬ 
gers” of “Japan, Inc.” He does not avoid 
treading on toes, and he does not offer 
apologies for past poor performances. 
Indeed, he states, “We cannot hide behind 
excuses and blame others any longer. We 
must and can learn how to compete on¬ 
shore. There is more to competing than 
direct labour cost.” Mclvor logically ana¬ 
lyzes problems encountered on the way to 
this goal and offers solutions in a forth¬ 
right manner that is quite refreshing. 

The 18 chapters take the reader through 
the various steps required to reorganize a 
large corporation on its many levels. The 


topics range from business strategies and 
product introductions to appropriate ac¬ 
counting systems, each described in an 
analytical and detailed manner with lib¬ 
eral use of graphs, tables, flowcharts, and 
diagrams. 

While many of the topics are rather dry 
and complex, Mclvor’s interest and en¬ 
thusiasm capture the reader’s interest 
early in the book, although his efforts to 
maintain that level of interest sometimes 
make Mclvor sound like a cross between 
Tom Peters and Lee Iacocca. 

Many of Mclvor’s suggested proce¬ 
dures and ideas seem to be based on his 
own involvement at Motorola. Problems 
are identified, solutions sought, and re¬ 
sults reviewed in a manner reminiscent of 
classic corporate case studies. The stud¬ 
ies here include forming strategic alli¬ 
ances, implementing computer-inte¬ 
grated manufacturing, and building an 
integrated business information system. 

One complaint I do have with the book 
is its organization. The author wanders 
from his primary path several times for 
short side trips. For example, in the chap¬ 
ter on economics and business cycles, 
Mclvor reprints a Wall Street Journal ar¬ 
ticle on the events of “Terrible Tuesday.” 
While I found this quite interesting, I am 


not sure that most readers would find its 
inclusion worthwhile. 

Likewise, the chapter reviewing basic 
semiconductor physics and manufactur¬ 
ing processes would probably be of use to 
only a very small portion of the intended 
audience. However, perhaps to explain 
these side trips, Mclvor states in the fore¬ 
word that managers need to know a little 
something about everything. 

This book offers invaluable informa¬ 
tion and insights for manufacturing pro¬ 
fessionals who become executives or 
who just aspire to the post. The chapters 
on economics, business forecasting, pro¬ 
duction systems, new product introduc¬ 
tions, and definitions of common terms in 
profit-and-loss reports offer a wide range 
of useful and valuable information. 

While Mclvor’s book might require a 
relatively large commitment of time, it is 
likely to become a valuable resource. 

Any time spent reading it would be time 
well spent by either a novice manager or 
an experienced manager looking for new 
ideas. Indeed, it could prove beneficial at 
your next performance evaluation just to 
have been seen reading it. 

Robert Stratton 

McCarthy Tetrault 


Practical LANs Analyzed 

Franz-Joachim Kauffels (Ellis Horwood Ltd., Chichester, England, 1989, 334 pp., $44.95) 


This book has a weak beginning, but it 
makes up for it in the middle and end. My 
advice is, learn a little about data commu¬ 
nications elsewhere, and then skip the 
first two chapters. You won’t miss much, 
and you’ll move right into the good stuff. 

The introductory chapter, intended to 
provide an overview of local area net¬ 
works, is confusing. The author attempts 
to tie these systems to applications, but 
his discussion is so theoretical and gen¬ 
eral that he is unsuccessful. 

He follows with a general discussion of 
data communications. While it is an in¬ 
troduction to the field, the reader must 
have at least some background in elec¬ 
tricity and the electrical properties of 
substances to benefit from the discus¬ 
sion. Even with such a background, how¬ 
ever, most people would find this chapter 
difficult to comprehend. 

Also, these first two chapters contain 
too many examples of bad English, 


whether caused by bad editing or translat¬ 
ing. Thankfully, such errors are far less 
common in the rest of the book. 

In Chapter 3, the author begins to focus 
less on theory and more on real aspects of 
real LANs. He is far better at this type of 
material. The fine discussion of different 
topologies includes not only the standard 
bus and ring, but also tree, star, plait, and 
various hybrid topologies. The reader 
comes away with a good understanding of 
how design requirements influence the 
choice of topology. 

The author then discusses a number of 
different access-control methods, di¬ 
vided into token-passing and contention 
schemes. Several variations of each 
method are clearly explained, and there is 
some good mathematical analysis of their 
performance. The author also offers a de¬ 
tailed analysis of Ethernet as a specific 
example of a working system. 

The chapter on standards introduces 


the Open Systems Interconnection refer¬ 
ence model as well as other standards from 
the European Computer Manufactuer’s 
Association, the IEEE, and the Defense 
Dept. The Manufacturing Automation 
Protocol suite also receives a more exten¬ 
sive treatment here than is usually found 
in texts of this sort. By postponing his dis¬ 
cussion of standards until this point, the 
author lets the reader evaluate and recog¬ 
nize the value of the various standards. 

Since the author intended to deal with 
“practical LANs,” I was especially 
pleased to see an entire chapter devoted to 
Digital Equipment Corporation’s Digital 
Network Architecture and IBM’s Sys¬ 
tems Network Architecture. These are 
certainly two of the most important and 
widespread networking architectures, 
and most new graduates will encounter 
these before they see OSI. By focusing 
only on the LAN-related aspects of these 
two architectures, the author provides a 
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good basis for understanding their full 
breadth. 

The concluding chapter has the obliga¬ 
tory look at the future, but the author 
keeps his discussion practical by examin¬ 
ing items that should appear in the near 
future. Some of the systems, like FDDI 
and Datapipe, are already implemented 
and should become widespread soon. 

The book’s European flavor is unmis¬ 
takable. While this is the first English 
translation, it is the third edition of the 
German text. A significant number of the 


references are to German-language mate¬ 
rials. While the IEEE standards are not 
slighted, the ECMA standards are dis¬ 
cussed earlier and more thoroughly. Also, 
while MAP is a rarity stateside, its popu¬ 
larity in Europe accounts for its frequent 
mention in this text. Someone anticipat¬ 
ing the changes planned for Europe in 
1992 might find this book a good blend of 
European and American thinking. 

Since this book is based on the author’s 
lectures at the University of Bonn, it 
should be quite accessible to upper-level 


undergraduates and would be an excel¬ 
lent supplemental text for any course in 
general computer networks. There are nc 
exercises. 

Practicing professionals should also 
find something of interest here. If you can 
pick up the elementary ideas of data com¬ 
munications from another source, this 
book will give you a deeper understand¬ 
ing of local area networks. 

Christopher J. Jasen 

Occidental Chemical Corporation 


eSSSCS MAGAZINES 


June 1990 IEEE Micro 

Multiplexed Buses: The Endian Wars Con¬ 
tinue, David James 

The processing order of bytes can differ within 
a processor and so can the transmitting order 
between nodes on a bus. Adding it all up pro¬ 
duces superior bus designs. 

The 68040 Processor: Part 2, Memory De¬ 
sign and Chip Verification, Robin W. Eden- 
field et al. 

An address-translation cache features a 4- or 
8-Kbyte selectable page size while translation 
registers provide transparent mapping of seg¬ 
ments up to 4 Gbytes. 

The TMS390C602A Floating-Point Copro¬ 
cessor for Sparc Systems, Merrick Darley et al. 
A dedicated floating-point coprocessor im¬ 
proves the performance of a two-chip RISC 
processor based on the Sparc architecture. 
Motorola’s 88000 Family Architecture, 
Mitch Alsup 

A new generation of architectures emphasizes 
performance by means of pipelined data paths, 
cache memories, and optimizing compilers. 

Special Feature —The Gmicro/300 32-Bit 
Microprocessor, Takeshi Kitahara and Taizo 
Satoh 

Executing an instruction with a memory oper¬ 
and and a register operand in one clock cycle 
presents no problem for this TRON-architec- 


Capturing Knowledge through Top-Down 
Induction of Decision Trees, N.A.B. Gray 
Examining knowledge representation prob¬ 
lems and alternatives in the context of chess. 

Validating Expert Systems, Timothy J. 
O’Leary et al. 

Presenting a case history of a formal prototype 
validation paradigm. 

Expert System Security, Daniel E. O’Leary 
Analyzing unique factors, possible solutions, 
and effects on users. 

Maintainability Techniques in Developing 
Large Expert Systems, David S. Prerau et al. 
Using current-generation tools to produce 
comprehensible and maintainable systems. 

June 1990 IEEE Design & 
Test of Computers 

Tuning VHDL for Multivalue Logic Model¬ 
ing, J.R. Armstrong 

The package construct in VHDL offers design¬ 
ers the flexibility to meet the challenges of 
multivalue logic modeling. 

A Minimalist Approach to VHDL Logic 
Modeling, Paul J. Menchini 
VHDL vendors and users have formed the 
VHDL Design Exchange Group whose charter 
is to develop an industry-consensus logic 


model that is portable across many existing 
VHDL tools and a method for its use. 

A VHDL Standard Package for Logic Mod¬ 
eling, David R. Coelho 
An in-depth look at the VHDL package con¬ 
struct shows how designers can create VHDL 
environments that correspond to the more tai¬ 
lored approaches of traditional hardware de¬ 
scription languages. 

A Value System for Switch-Level Modeling, 
Steven P. Smith and Ramon D. Acosta 
This approach to switch modeling provides an 
excellent compromise between accuracy and 
performance and requires only minor modifi¬ 
cations to basic gate-level simulators. 

Transparent Logic Modeling in VHDL, 

Joanne E. Degroat 

With these modeling conventions and VHDL 
library techniques, designers can transpar¬ 
ently map between multivalue logic systems 
without modifying the model itself. 

Logic Modeling in Waves, Alfred S. Gilman 

The waveform and vector exchange specifica¬ 
tion allows design and test engineers to ex¬ 
change information between simulator and 
tester environments by capturing both the 
logic value sets in simulation and the pin codes 
used in contemporary test-vector languages. 

A D&T Roundatble - Does VLSI Education 
Meet Industry’s Needs? 


June 1990 IEEE Expert 

Planning for Space Telerobotics: The 
Remote Mission Specialist, Mark Rokey and 
Sven Grenander 

Automating command generation and plan¬ 
ning in spacecraft control. 

Representation Selection for Constraint 
Satisfaction: A Case Study Using n-Queens, 
Bernard A. Nadel 

Choosing the best representation for problem 
solving. 

Expert System Development: A Retrospec¬ 
tive View of Five Systems, Adam Irgon et al. 
Reviewing lessons learned — why be a Colum¬ 
bus every time? 
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Call for Papers 

THE SECOND INTERNATIONAL CONFERENCE ON 
SYSTEMS INTEGRATION 


(^) IEEE COMPUTER SOCIETY 


^ THE INSTITUTE OF ELECTRICAL AND 
r ELECTRONICS ENGINEERS, INC. 


© 


ASSOCIATION FOR 
COMPUTING MACHINERY 


Headquarters Plaza Hotel, Morristown, New Jersey 
April 22-25,1991 


Theme: Managing Large-Scale Integration in the 1990s. 


This conference focuses on the integration of technologies, processes and systems, and the deyeioprnent of ^ ^°' s 

enabling solutions to complex multi-disciplinary problems. A special emphasis is placed on the management of 
The conference will provide an international and interdisciplinary forum in which researchers and practitioners cari share novel esea c 
engineering development, and management experiences. Papers should deal with recent work in theoy. design, implementation, utilization 
and experiences of integrated processes and systems. Topics to be addressed include, but are not limited to. 

• Process Modeling and Characterization • Re-engineering and Process Simplification • integration Process in 
Industry Applications • Next Generation Computer Aided Environment for Engineering Design Manufactunng, Sys^m Deveiopment etc^ 
Role of Human Engineering in Large-scale Integration • Experiences of Large-scale Integration Project; • The i of Systems 

Integration for Manpower Skills • Quality Control in Large-scale Integration • System Architecture for Integration • Automatiom of Processes 
and Systems 

Information and Instructions for Authors: Authors are cordially invited to submit original technical papers tci the Pr^am Carman no 
later than September 14, 1990. All papers must be in English, typed in double spaced format and may not exceed 6,000 words^ Each 
submission should provide a cover page containing author(s), affiliation(s), complete address(es), identification of pr ncipal ^thor and 
telephone number. Also include SIX copies of complete text with a title and abstract. Notice of cacceptance wiH be• matted to'the pnncipal 
author(s) by December 3, 1990. If accepted, the author(s) will prepare the final manuscript in time for inclusion in the' 
proceedings and will present the paper at the conference; otherwise, the author(s) will incur a page charge. Authors of accepted pape 
must sign a copyright release form. 


Please send SIX copies of your paper(s) to 

Program Chairperson: 

Dr. Raymond T. Yeh 

do Prof. Peter A. Ng 

Dept, of Computer & Information Science 

New Jersey Institute of Technology 

University Heights 

Newark, NJ 07102, U.S.A. 


Paper Arrival Deadline: 
September 14,1990 

Acceptance Notification Deadline: 
December 3,1990 
Final Manuscript Inclusion 
Deadline: January 7, 1991 


For further information contact Peter A. Ng, Department of Computer and Information Science, New Jersey Institute of 
Technology, University Heights, Newark, NJ 07102, U.S.A., (201) 596-3387, ngj3@vienna.njit.edu 
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Local Arrangement Chair: 

Steering Committee Chair: Peter A. Ng, NJIT 










TUTORIALS: October 29-30,1990 ■ CONFERENCE: October 31-November 2,1990 


The Fourteenth Annual International 

Computer Software & 
Applications Conference 

comnsaeSS 


Holiday Inn 
Mart Plaza 
Chicago, Illinois 


TUTORIALS 

Monday, October 29,1990 

■ Visual Programming Environments 

by E. P. Glinert, Rensselaer Polytechnic Institute 

■ Real-Time Systems and Applications 

by H. W. Tyrer, University of Missouri-Columbia 


Tuesday, October 30,1990 

■ Reverse Engineering, Design Recovery and CASE 

by E. ]. Chikofsky, Index Technology Corporation 

■ Object Oriented Programming 

by R. K. Ege and M. T. Milani, Florida International University 


CONFERENCE-AT-A-GLANCE 


COMPSAC '90 


Wednesday, October 31,1990 

Thursday, November 1,1990 

Friday, November 2,1990 
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COMPSAC '90 Registration Chair 
5642 South Harper Avenue 
Chicago, Illinois 60637, USA 
(312) 752-4562; FAX: (312) 752-4562 


David Carney 

COMPSAC '90 Conference Chair 
AT&T Technologies, Inc. 

475 South Street 

Morristown, New Jersey 07962, USA 
(201) 631-6500; FAX: (201) 631-5449 


IEEE Computer Society 
13, Avenue de l'Aquilon 
B-1200 Brussels, Belgium 
32-2-770-2198 
FAX: 32-2-770-8505 


For additional information, contact: 

Ifay F. Chang 

COMPSAC '90 Program Chair 

IBM Corporation, T.J. Watson Reearch Center 

P.O. Box 704 

Vorktown Heights, New York 10598, USA 
(914) 784-7825; FAX: (914) 784-6211 


Stephen S. Yau 

COMPSAC '90 Steering Committee 
Computer and Information Science Dept. 
301 CSE Building 
University of Florida 
Gainesville, Florida 32611, USA 
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